Heap Memory and Garbage Collection.

Already discussed is Global static memory, which is reserved before the program starts, and local stack memory that is allocated at the start of each function, and then released at the end of that same function.

Heap Memory is a third type, for when neither global nor stack memory are suitable. Consider, for example, data is that returned from the function for use by the calling code. This data is still needed after the function completes, so it cannot be stack memory. If the amount of instances of this data, or the size or type is not known until the program is running, then it cannot have been reserved to be a fixed size before the program runs as with the initial global static memory.

Code can need to allocate memory dynamically (not static), and not release the memory for that data assoon as the function ends. So for when neither static global memory, nor stack memory which is released when a function completes: Enter Heap memory. Heap memory can be created at any time, and released at any time. More flexibility, but more work to manage.

Heap Memory Using ‘C’ Language

Why “Using ‘C’ Language”? Because, while this bliki does not often use ‘c’ language for explanations, ‘C’ can show the mechanism for what is happening automatically in languages like Pyton and Kotlin. Remember, the python interpreter is written in ‘C’, so in the end, how everything actually works is written in ‘C’. Because ‘Heap memory’ usage in Python and Kotlin happens automatically and in the background, using ‘C’ can shed linght on the mechanisms that may otherwise not be seen.

Consider the following ‘C’ code to create a linked list, where each ‘node’ in the list holds an ‘int’ and a link to the next node in the list. The code to focus on is the ‘newList’ function and the ‘frontInsert’ function.

typedef struct _node *link; // define link to node in advance

typedef struct _node { // define n
    int value;
    link next;
} node;

typedef struct _list {
    link head;
} *list;

list newList (void){ // the function to create a newList
    list new = malloc(sizeof(*new));
    new->head = NULL;
    return new;
}

void frontInsert (list lst, int item){  // add a node at the start
    assert (lst !=NULL);
    link ptrToNewNode = malloc (sizeof (node));
    ptrToNewNode->value = item;
    link oldHead = lst->head;
    ptrToNewNode->next = oldHead;
    lst->head = ptrToNewNode;
}

void append (list lst, int value){ // add a node at the end
    link linked = malloc (sizeof (node));
    linked->value = value;
    linked->next  = NULL;

    if (lst->head == NULL) {
        lst->head = linked;
    } else {
        link current = lst->head;
        while (current->next != NULL) {
            current = current->next;
        }
        current->next = linked;
    }
}


The newlist function creates and returns a new ‘list'. Creating the list requires memory that is not declared globally, and cannot be on the stack, so the heap must be used for the list itself, and for the ‘nodes‘ in the ‘list‘. Calls to ‘malloc‘ with the number of bytes of memory required, results in the operating system returning a block of heap memory of the required size.

The function ‘newList‘ calls malloc for the size of the list structure (a struct is like an object without methods), and the functions ‘frontInsert‘ and ‘append‘ both call ‘malloc‘ to allocate memory for nodes.

Python And Kotlin Use of the Heap

Neither Python code, nor Kotlin code, have explicit calls the ‘'malloc‘, but in both languages, instancing new objects will automatically have these calls happening as objects are stored on the heap for the flexible memory allocation provided.

In Python

The following python code, is the equivalent to the above ‘C’ code. Of course Python already has lists, so this code is to look at what is involved in a program making its own lists, to consider how this is done.

Instancing of any object in Python, generates a call to ‘malloc‘ automatically, so inside Node() and NewList(), there are allocations of memory from the heap. In fact, all Python mutable objects live on the heap. String and numeric, None, true, false and tuple literals are all immutable, so they need not be on the heap, but all objects with class definitions in python code, are on the heap.

The code in python is simpler, and there is no possibility of allocating the wrong amount of heap memory being request as the call is automatic and hidden, but memory is allocated by Node() and NewList() and any time an object defined by a class is explictly created. Every Python object has a__new__‘ method. While most programmers are familiar with ‘__init__‘ methods for initialising objects, there is also a ‘__new__‘ method which allocates the memory for the object before the init method is called, which is why there is already a 'self‘ at init time.

The only thing of note in the Python code is that ‘next: Any‘ should really be ‘next: Optional[Node]‘ to be technically correct, but is not currently processed by Python, as of Python 3.7. (Python 3.7 does allow references to the class within the class, just not in type annotations, and as Python does not enforce the type annotation anyway, ‘Any‘ is workable solution, just not as clear for readablity.)

from dataclasses import dataclass
from typing import Any

@dataclass        
class Node:
    value:int
    next: Any = None  # type should be 'Optional[Node]'        

@dataclass
class NewList: // Python already has its own lists, so NewList
    head:Node = None
            
    def frontInsert (self, item):
        self.head = Node(item, self.head)
    
    def append(self, value):
        newvalue = Node(value)
        if self.head == None:
            self.head = newvalue
        else:
            current = self.head
            while current.next != None:
                current = current.next
            current.next = newvalue

In Kotlin

The code in Kotlin is provided for completeness, but conceptually all is similar to Python. Objects can be thought of as being on the heap. The static nature of Kotlin, and the use of ‘const’ and ‘val’ keywords together with ‘object’ declaration in place of class for singleton objects, does mean the Kotlin compiler can optimise code and place of objects in either stack or global memory, as Python does for immutable literals, but generally, the flexibilty of the heap is available, and anthing that requires flexible memory allocation, will automatically be on the heap.

data class Node (var value:Int, var next: Node? = null)
    
class NewList{
    var head:Node? = null
    
    fun frontInsert (item:Int) {
        head = Node(item, head)
    }

    fun append(value:Int) {
        val newvalue = Node(value)
        if (head == null) {
            head = newvalue
        } else {
            var current:Node = head

            while (current.next != null) {
               current = current.next
            }
            current.next = newvalue
       }
    }
}

Memory ‘Leaks’ and the Heap

Global memory is allocated once, and exists for the entire program. It does not grow in size or reduce size. Stack memory usage increases with every function call, but then automatically has a corresponding matching reduction of size at the end of the function.

Heap memory use is increased with every call to ‘malloc‘ as objects are created, so the longer a program runs, the more calls to ‘malloc‘, the more heap memory a program could need. Like stack memory, heap memory also must released to be reused once it is no longer needed. Stack memory is released at function end, but for heap memory, the whole idea is that it can exist for a longer time. So how does the memory become free again? Well, in ‘C’ there is an explicit ‘free‘ function. Unlike stack memory, the release of the heap memory is not automatic, so care has to be taken with languages such as ‘C’ to ensure every ‘malloc‘ has a matching ‘free‘.

// code to allow for returning the memory from above 'C' code
void showAndDeleteList (list lst) {
    assert(lst != NULL);
    int count = 0;
    link current = lst->head;
    link old;

    while (mylink != NULL) {
        printf("%d ",current->value);
        old = current;
        current = current->next;
        free(old);
    }
    free(lst);
}

Code which may be called repeatedly during the running of a program, that does not free memory that has been allocated, will eventually result in the computer running out of available memory. As available memory decreases, the computer will become slow as it tries to work with ever reducing working memory. A failure to free memory that has allocated via ‘malloc’ is called a memory leak.

Garbage Collection

Python and Kotlin programs call malloc automatically in the background. It follows they must also call ‘free‘ automatically in the backgroud as well. The automated calling of ‘free‘ for objects or other memory in the heap once the memory is no longer required is called ‘garbage collection‘. Both Kotlin and Python have automated garbage collection, which means normal code should not try to manually garbage collect.

class Node:

    def __init__(self, value, next=None):
        self.value = value
        self.next = next 
        print("node init.")

    def __del__(self):
        print(f"Destructor, node {self.value} destroyed!")
>>> a=Node(5)
node init.
>>> b=a  # not a copy, but the same node
>>> del a  # no destructor yet, because there is another reference
>>> del b
Destructor, node 5 destroyed!
>>> 

The ‘__del__‘ method is only called when the last reference to the Node(5) is removed, when garbage collection happens. ‘del a‘ does not garbage collect the Node(5) referenced by ‘a‘, it just stops ‘a‘ having a value. Only when the last reference to Node(5) is removed does garbage collection happen. This means that you cannot directly free up the memory, and a program must rely on Python freeing the memory when objects are no longer referenced.

Here is another example, where ‘a‘ is simply set to a new value. Again, the Node() is no longer referenced, so '__dell__‘ is called.

>>>a=Node(7)
node init.
>>> a=0  # anything that deletes all references to Node() calls __del__
Destructor, node 7 destroyed!
>>> 

Also note it is not good practice to call '__dell__‘ methods directly, as the results can be strange as seen below.

>>> # dont call __del__() directly!
>>> a=Node(3)
node init.
>>> a.__del__()
Destructor, node 3 destroyed!
>>> a.__del__()
Destructor, node 3 destroyed!
>>> a
Destructor, node 3 destroyed!
<__main__.Node object at 0x0000025599D04D30&gt;
&gt;&gt;&gt; a
<__main__.Node object at 0x0000025599D04D30&gt;
&gt;&gt;&gt; a
<__main__.Node object at 0x0000025599D04D30&gt;
&gt;&gt;&gt; a.value
3
&gt;&gt;&gt; del a
Destructor, node 3 destroyed!
&gt;&gt;&gt; 

Kotlin has no equivalent to the ‘del‘ statement, but it does have an equivalent to the ‘__dell__()’ method, which is the finalise method as in the example below.

data class Node (var value:Int, var next: Node? = null){

        protected fun finalize() {
            // finalization logic
        }
}

Note that finalise() dependent on the JVM garbage collector making it difficult to predict when finalise will be called, and in fact is not guaranteed to be called at all, and is available only in JVM version of Kotlin. In summary, using finalise is not recommended at all.

Conclusion.

While it is of clear value to understand the concept of the heap, and that most objects will automatically be stored in the heap, in practice when programming in Kotlin or Python those concepts can be kept in the background. It is all handled automatically and best left that and only be considered for applications strangely using too much memory or behaving strangely. In most Python and Kotlin software, every things is done automatically and only creating and finalise methods to bother with are init methods, plus ‘__post_int__()‘ for Python data classes and constructors for Kotlin.

The heap is an overhead. For most applications, that overhead is completely secondary to keeping algorthims efficient, and the rare applications where the overhead does matter should not be written in Python at all, but rather ‘C’, or another native code language. With Kotlin native techniques do allow restricting heap usage to where it is needed, as opposed to where it is providing flexibilty will not be used and should be traded for speed.

Advertisements

1 thought on “Heap Memory and Garbage Collection.”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s