Variables & Types (Under the hood)

This posts delves a little deeper in to variables in both Python and Kotlin.

Nobody Expects….

In many languages, the simplest variables operate like a calculator memory: A value is stored in the memory, and can be retrieved later. Python simply does not work this way, and until you realise that, some results are unexpected. Consider the following code.

>>> a = [1, 2, 3]
>>> b = a
>>> b
[1, 2, 3]
>>> b[0] = 5
>>> b
[5, 2, 3]  
# b has been expected
>>> a
[5, 2, 3]
# but a is also changed, was that expected? 

It turns out that a does not initially contain the list [1, 2, 3] the way a calculator memory holds a number, but rather ‘points to’ the list with the value [1, 2, 3] , and setting b=a results in b pointing to the same list. This means a and b are now sharing the same list. So changing the list pointed to by b is also changing the list pointed to be a. Hence, the result above.

Pointers in Python (and Kotlin)

So, as shown in the example above, variables in Python are all what is known in the language ‘c’ as pointers. The value for the variable is value at the location that is ‘pointed to’. Some operations create a new value at a new location, which then requires the variable to point to the new location. Experiments with the location are easy in Python. The function ‘id’ will give the location of any value (also know as an object, as in Python all values are objects).

>>> a = [1, 2, 3]
>>> id(a)  #find where 'a' is pointing
# note:  actual id value will change depending on what has run before
>>> b = a
>>> id(b)
>>> b is a  # 'is` compares the id() of each
>>> b
[1, 2, 3]
>>> b[0] = 5
>>> b is a  # are the still the same list?
>>> b
[5, 2, 3]  
# that data at b has been expected 
>>> id(b)  # even though b is still the same
>>> a  # so value at a is the same list as b still
[5, 2, 3]
>>> b = b +  [4]
>>> id(b)
# ok....b is now pointing to a different place, so this made a new list!
>>> b is a  # are b and a still the same? 
>>> b  # does new list have the right value?
[5, 2, 3, 4]
>>> a  # a should not have been changed, still the old list
[5, 2, 3]
>>> id(a)

Hopefully the above illustrates how Python variables work, and Kotlin variables behave in basically the same way, with any differences in behaviour explained later in this page.

The behaviour is: The variable contains the `id` of an object in memory. Some calculations can change the value at that id, but most calculations will produce a result stored in a new location, and change the id value of the variable to point to this new location. So how do we know which calculations do which of these two options? Keep reading 🙂 First the background of the next topics, then shared modifications.

Primitive Values vs Reference/Pointer Variables.

Primitive Types as used in Java, C, C++ etc

The concept of primitive values is simple. Allocate an area of memory to hold the value. Call the allocated area of memory a ‘variable’ and give it a name. To change the value of the variable, change the value in the memory. As each variable has its own memory, there are no side effects of changes to other variables. In Java and C, data of the basic types: Boolean (true/false), numeric values (float, integer, long integer etc), character and even character strings all work by this ‘fixed area of memory allocated to each variable’ approach.

With primitive types, the classic a = b requires copying the data from the memory allocated to b to the memory allocated to a.

Reference/Pointer Variables in Java, C. C++ etc

All those ‘Primitive’ types were a fixed size for every possible value, but almost all data other than the primitive types can have the potential to change in size when the value changes. Consider a string that starts with the value “Hello” but may later change to have the value “Hello World”. Lists and dictionaries can grow dynamically. Objects values can vary in size either when the object contains values that vary in size, or even just when there may be sub classes. If a variable is declared as class ‘Shape’, then it could be set to an object of the sub-class ‘CircleShape’ which could require additional memory.

Clearly pre-allocating the memory is problematic. Either that memory must allow for the value that uses the larges possible amount of space, or the memory could overflow. The solution is to use ‘the heap’, an area of memory managed to make blocks of memory of requested size available on demand. However, this means that string that changes from “Hello” to “Hello World” will change which area of memory is used when the value changes in size. A variable, or allocated memory area for the variable, is needed just to keep of where the value is located. Fortunately, this new memory requirement is just like that of primitive values,does not need to change in size, and so ‘pointer’ can become a new ‘primitive’ variable type. The size needed by ‘pointer’ and can be fixed at the size needed to address memory computer.

Unlike with primitive values which are stored in the actual variable, these values are now stored in the ‘heap’, and the location of the value is stored as the primitive ‘pointer’ value in the area uniquely allocated to that variable.

The classic ‘a = b’ requires copying the pointer value of b to the pointer value of a, and then both a and b will ‘point’ to the same value. This means modifying the value pointed to by a will also change b, and visa versa.

Pointer? Reference? Id?

Almost every language has this concept I have described as ‘pointer’, but the name changes. Java uses ‘reference’, python uses ‘id’. This change of name also reflects what the language allows the programmer to do with these ‘pointers’. In ‘c’, pointers are fully accessible to the programmer as regular variable. The programmer can set the pointer to any value they choose, effectively allowing the pointer to specify any location on the heap as where the value should be: half way through a value, or even in an area not allocated at this time. Very powerful, but also very dangerous. Many other languages use another term (such as reference or id) to make the distinction that the language itself sets the ‘pointer’ to allocated areas, and ensures the pointer is always set to the start of an allocated area.

Primitive Values in Pyhon

As discussed aboe this means two types of values, and thus two types of variables: Primities and Reference/Pointer.

Python simplifies things by making all primitives (other than pointer) into reference variables. This also allows all data values to be an object. In C and Java, primitive types are not objects, so this allows Python to be in some ways a more ‘pure’ object oriented language as everything in an object. Pointers (or ids in python speak) are managed by the language so while it does become important to understand the existence of pointers making this work, from a programming point of view, all variables are reference variables. Since all variables use pointers, and all pointers require the same amount of memory, a variable can now dynamically point to any type of value. The fact that a float may require more room than an int no longer matters, since the actual variable only holds the pointer to the heap, and the value on the heap can be of any length.

Using pointers to values on the heap, means that values can be shared, and modifying the value for one variable will change the value for other variables sharing the same value.

This impact of this ‘sharing’ in reduced by making all the values that are ‘primitive’ in other languages, immutable.

Primitive Values in Kotlin.

Kotlin appears very much like Python. Everything is an object, even primitive types use references and the ‘shared values’ challenge is again reduced by keeping primitive types immutable. At least all programs in Kotlin behave is if this were true. The compiler, armed with addition information that is available with Python such as types, ‘var or val’ and private and other modifiers, has more scope to change how things actually work and some times internally make use of primitive types likes C and Java, but still ensure that all behaviour of code is consistent with the ‘everything is an object and uses references’. Again, it is immutable primitive values that keeps programming simple.

Mutuable vs Immutable.

Mutable? Modifiable!

There are two types of values in computing: Mutable and Immutable. Mutable basically means the value can be modified or altered. Both Python and Kotlin use immutable basic data types, while with languages such as Java or C, the primitive types of variable are mutable, like the memory of a calculator. Consider the following code:

a = 3
b = a
a =+ 2
a = "The quick brown fox"
a += " jumps over"

With Mutable Primitive Types

  • a = 3
    • copy the value 3 into the memory for ‘a’
  • b = a
    • copy the value held in the memory for a to the memory for b
  • a += 2
    • add to the value held in to modify the value and produce 5
  • a = “the quick brown fox”
    • cannot set a to “the quick brown fox” as the memory area allocated for a is the size for an ‘int’
  • a + = “jumps over”
    • if a had been allocated sufficient memory for the resulting string initially, then ” jumps over” could have be copied on to the end of that string. This would required a to have been initially declared as a block of memory with sufficient space to hold whatever size string may eventually be required

With Immutable Reference Types

  • a = 3
    • set a to point to the value 3 that is already in memory
  • b = a
    • set the pointer value of b to same location as the pointer for a, now both share reference the same now shared memory area and thus the same shared value
  • a += 2
    • since an integer is immutable, a new memory area is allocated. The sum of ‘3+2’ is stored in this new memory area a is set to reference the new memory area. The initial value ‘3’ is still referenced by b.
  • a = “the quick brown fox”
    • a can be set to point to memory area allocated for the string as all ‘pointers’ (or called ids in Python) require the same space in memory
  • a += ” jumps over”
    • strings are immutable, so ” jumps over” cannot be added to the original string. A new area of memory large enough for the join of both strings must be allocated, the two strings are then copied into this new area and a is set to point to this new area.

Mutable & Immutable Values in Kotlin & Python

Immutable Values in Python

  • int, float, Decimal
  • str
  • bool
  • None
  • tuple

Mutable Values in Python

  • List
  • Dict
  • Set
  • All instances of classes (other than built in classes)

Immutable Values in Kotlin

  • Int, Float, Decimal (plus Double etc)
  • String
  • Boolean
  • Unit
  • List, Map, Set
  • Objects and instances of classes with only immutable val properties

Mutable Values in Kotlin

  • MutableList
  • MutableMap
  • MutableSet
  • Objects and instances of classes that include vars or vals of mutable data

Python does not have the ability to declare variables as immutable. The does not prohibit code that leaves all variables set to their original value, but it does mean it requires careful reading of the code to ensure there is no modification of values.

Python also does not have the ability to define user defined classes that are immutable. Again this does not prevent creating code that uses classes as if they are immutable, but again the language is providing no help in ensuring this is the case, not help in reading the code to know this is happening.

In very small programs which will be read only by the author of the code neither of these limitations is very significant, but in larger projects with more and the may be needed to be read by different programmers, this makes immutable classes problematic as a concept in Python.

Shared Modifications – The Devil Inside

As revealed in ‘Nobody Expects‘, reference/pointers variables can share the same data, leading to modifications to one variable, also resulting in changes to the other variables that share the same data.

In C and Java, this can never happen with ‘primitive’ data, as that data does not use the reference/pointer approach. With Python and Kotlin, similar behaviour results from keeping these primitive values immutable. The complication of shared changes only applies in either language type with changes to elements inside a container.

Consider the following python code:

class Fred:
    def __init__(self, i:int):
       self.i = I

# changes to 'top level' can never be shared changes
# "name = " is always a top level change
a = 3
b = a
a = 7 # no sharing

a = [ 1, 2 ,3]
b = a
a = [ 4, 2, 3] # no sharing

a = Fred(1)
b = a
a = Fred(2) # still no sharing

# if there is a . or a [] left of the = sign, changes my be shared
a = 3 # no . or [] usage is possible

a = [ 1, 2, 3]
b = a
a[0] = 4  # change to element in container - shared change!

a = Fred(1)
b = a
a.i = 2 # change to element in container - shared change!  

Modifying a value that is inside a container such as a list, a dictionary, or an object, results in a change that may be shared with other

Kotlin code follows the same rules, but the “.” is not always in the statement as an object can be “in context” where “object.value = newvalue” can have the “object.” part automatic for a block of code.

Class vs Type in Python & Kotlin

As discussed above, C, Java and many other languages have ‘primitive’ types which are quite distinct from Classes and Objects. Python has no ‘primitive’ types, and in Kotlin and primitive types are internal to the compiler and only generated as a result of internal optimisation. Programmers may be aware of those internal optimisations in Kotlin and specifically programming for them, but for all but that use case in both Python and Kotlin: Class and Type are effectively the same thing.

In computing in general type and class need not be the same as depending on the computer language they can be different, but in both Python and Kotlin there is only built-in Classes/Types and non-built-in Classes/Types.

Consider this Python code:

>>> class Fred:

>>> f=Fred()
>>> type(Fred)
<class 'type'>
>>> type(f)
<class '__main__.Fred'>
>>> type(int)
<class 'type'>
>>> type(3)
<class 'int'>

The dummy class Fred is of class 'type', just as with the int class. An instance of Fred is of class '__main__.Fred' just as an instance of int is of class 'int'.

The difference between a class created by the program and a built in ‘type’? The class ‘Fred’ gets an internal prefix with the module creating the class (__main__ in the example) and most original pre-defined classes do not have a leading capital: int, float, str, list, dict, bool all start with a lower case, while: None, Exception, Decimal and other newer types follow the conventional naming scheme of starting with an Upper Case Letter.

In Python 2.0 there were more differences with these original built-in types, but Python 3.0 erased the differences from other classes/types and only the lower case letters remain to remind that these names go back to an earlier time.

For more on classes and the variables within classes, see the section ” Classes Objects, Class Object & Instances” in the coding section of this site.

2 thoughts on “Variables & Types (Under the hood)”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s