Languages: Compilers, Interpreters

Kotlin is a compiled statically typed language, Python is an interpreted language. While you could interpret Kotlin or Compile Python, the examination in this post shows how the design of each language fits with as a language Kotlin=compiled and Python=interpreted as these are not just implementation choices, but are baked into the respective languages.

LT;DR: check the headings, only read headings of interest

  • Review: What are Computer languages?
  • Why Have Compilers & Interpreters?
  • Translate Please!
    • Translation: Compile Vs Interpret
      • Compilers
      • Interpreters
      • Multi-Step Translations
  • Interpreter vs Compilers: The Workings
    • Interpret = ‘do’ now, compile = declare now, do later
    • Do now requires dynamic
    • Python def (and lambda): Python ‘Do Later’?
    • Kotlin fun – declaration vs action.
    • Classes: Interpret = dynamic, compile = static
  • Role Reversal:
    • compiling Python
      • *.pyc files
      • just in time compilation
      • RPython
    • interpreting Kotlin
      • Kotlin REPL
      • Kotlin Script
  • Summary

What are Computer Languages?

There can be misconception that the programming languages are the languges of computers and reflect how computers work. They do not.

The actual language of computers is zeros and ones. Those zeros and ones to the computer are like the synapses in our brains. Trying to picture how a computer completes a complex task in terms of those zeros and ones, has similarities with trying to picture how to play a game of foot ball in terms of the ssynapses that should happen in our to play well.

Human languages did not evole to best reflect synapses and the inner technology of our brain, instead human language evolved to best express the things we need to comunicate.

Similarly, computer languages did to evolve to best reflect the inner workings of computers, but instead they have evolved to best express what we want computers to do.

Computer languages are designed to best allow humans to express what humans want computers to do.

Computer languages are for humans to create instructions for computers, and those instructions must be translated before computers can use them. Computing is not about learning the languages of computers, it is about learning to abstract what computers are required to do, so that there are clear abstractions of what computers are to do. The abstract expression of what computers are to do must be able to be translated into the minute details of in the internal machine code of the computer. Fine details may individually be programmed in assembler language or machine code, but to build complete programs this was is like creating the sequence of synapses required to play football.

Why Compilers and Interpreters?

We as humans do not control our synapses directly anyway, but even if we did, the sequence of instructions to learn to play football would be just too long and too detailed to deal with directly. We would want steps that are repeated such as “kick with the right foot” automatically translated into synapses. Think in term of ‘kick’ instead of the synapses encapsulates the concept of programming.

Programming is all about abstracting, but abstracting in a way that can be translated into those ‘synapses’, because we want a robot soccer player.

The program must be expressed as abstractions, and with ways to translate the abstractions into lower level abstractions. Eventually the abstractions must be translated into the ‘synapses of the computer’, and every level of translation is performed by a computer running a program. Translating programs fall into two categories: compilers and interpreters.

Translate Please!

Translation: Compile vs Interpret.

Compilation and Interpretation are the two approaches to translating from a designed for programming, into language more suitable for .

Compilation is to translate an entire program file into another language, without executing any of the program before completeing the translation. Like sending a book away to be translated and waiting for the translation before reading the book. The entire book can be read, in order to ‘compile’ information, before producing the translation. This means that at any time during translation, any part of the translated text could be revised or updated, because the translated text will not be read until the translation is declared final. So during translation, the translator could revise the text of chapter, since now with more understanding, it is realised a better translation is possible.

Interpretation is interactive translation. With interpretation, each program statement is immediately translated and must be acted on by the interperter before moving on to interpret the next statement. Like the ‘live’ translation that happens at the United Nations. Translation and running the program are interleaved. As soon as each step is translated it is ‘read’ by the reader, so if there is a revision to the translation, then the original has already been read.

Multi-Step translation is also possible. For example: compilation then another compilation, or compilation and then an intepreter. Two compilation steps is like sending a document for translation from Norwegian to English before then sending the English document to be translated into Latin, because no one is available who can translate direct from Norwegian to Latin. With mulitple steps, a compilation step could not follow an interpeter step, as once interpreted, it is too late to re-translate.

Interpreters vs Compilers: The workings.

Interpret = ‘do’ now, complile = declare now, do later

Interpreters not only translate each statement independently, they directly do the instructions of the program. So in the Python code below:

>>> print("foo" in globals())
False  # no foo in globals
>>> foo = "hello"
>>> print("foo" in globals())
True  # foo is now in locals
>>> print(foo)
Hello
>>>del foo
print(foo)

each statement is an instruction to Python to ‘do’ something. Each of the three three “print” lines is an instruction to print something, and the output appears immediately as the line of code is processed. The prints also show that there is no global ‘foo‘ until after the ‘foo = "hello"‘. The statement creates the variable ‘foo‘ in the python run time environment as a global, and sets ‘foo‘ to refer to the string “hello”. The creation of ‘foo‘ happens as the line is processed. The same statement adds “foo” to “globals()” and sets it to “Hello”. Although this example is shown in the Idle, the exact same behaviour occurs if the statements are in a python file run with the command “python‘ at the command line. However for Kotlin, compiling the code:

var  foo = "Hello"
fun main(){
    print(foo)
    println(bar)
}
var bar = " there"

//output is
Hello there

does not ‘do’ what the code says. Every statement is a declaration, a description of what something looks like. The first line declares ‘foo’ and then ‘main and ‘bar’ are all declared. There has to be a ‘main’ function, unlike with python, because the code must declare action that can take place when the program runs, and as lines such as print require action, then can only be declared to part of an action. They are part of declaring what ‘main’ will do when the program is run. Compilation just processes declarations, and processes all declarations before the program runs. Running this code both ‘foo‘ and ‘bar‘ exist even before the program starts, and before the statement declaring ‘bar‘. All declarations are complete during compilation, which is prior to the program running. The code is declaring, or describing, ‘foo‘, ‘main‘ and ‘bar‘, and all three exist at the start of the program. The compilation processes the declarations and does not ‘do’ anything, or execute the program. In fact running ‘kotlin <filename> from the command line will not even run this code at all, and instead produces a jar file, which requires additional steps to run. Intelij bundles all these steps so pressing ‘run’ compiles the file and then runs the jar file, but it is important to remember these are two separate steps, and the Kotlin compiler only completes the first step. Interpretation is a single step, the ‘do’ step, compilation is a declaration step and does not include the ‘do’ step. This means that for compiled languages, all declaration is complete before the program starts while with an interpreted language, declaration happens at run time. So at the start of the interpreted program, nothing is declared, and as the code executes more and more is declared. In fact Python code can even remove declarations, a concept which makes no sense in Kotlin when declarations are final before the program starts.

print("foo" in globals())  # using print so also works in Python file
foo = "hello"
print("foo" in globals())
print(foo)
del foo
print("foo" in globals())

Running the above Python shows how ‘foo‘ comes into existance during the running of the code, and then is removed from existance later in the program. By contrast, in compiled Kotlin code, every global element exists for the entire program.

‘Do Now’ Requires Dynamic.

All interpreted statements perform an action immediately, and any affects to the runtime environment happen immediately. This automatically means declarations that could require multiple interpreted statements must be dynamic, as what is being declared is constantly evolving with each new statement.

All compiled statements must be declarations, since nothing will happen immediately, and the separation of compile and run means all declarations can be seen as complete before code runs, so dynamic behaviour is not required. As declarations can be seen as complete and in place before code runs.

As compilation is a separate process, declaration information can be held internally to the compiler as it ‘compiles’ information on the program. Not all information compiled need play a role in the output of the compiler. On the other hand with an interpreter, it is not clear what may happen later, so all code must be must be, with no ability to determine if the results of the code will be needed by later code or not.

Python Def (and lambda): Python ‘Do Later’?

Python ‘def’ and ‘lambda’ have code that is saved to be run later. The uncertainly as to how code later code could alter existing declarations or add new declarations, means that saved Python code cannot be sure what any identifier will mean when the code is run.

Consider:

>>>def do_hello():
      hello()
>>>print("do_hello" in globals())
True
>>>def hello():
      print("Hello World")
>>>do_hello()
Hello Worldcan be set 

Each ‘def‘ statement, because of being interpreted, must ‘do‘ something immediately. The immediate action of the ‘def‘ statement is: create a variable and assign the code of the def to that variable. In Python, the def is the case where code is not run immediately. So ‘def do_hello‘ creates a global ‘do_hello‘ immediately the def has been completed, and sets the ‘do_hello‘ variable to the code in of the def, which in this case calls ‘hello‘, even though ‘hello‘ is undeclared at the time of this def statement. This means a call to ‘do_hello‘ at the time the code is saved, would fail as there is no definition for ‘hello‘ at this time. However, a defintion for ‘hello‘ might be added before calling ‘do_hello‘ as in the example above. How this works is that the Python code in ‘do_hello‘ checks ‘globals’ at run time to find ‘hello‘. By checking ‘globals()‘ at the time the code is run, and each time the code is run, the value of ‘hello‘ () can be set at any time, and the value can even changed between differnt times the code is run, as shown below.

def do_hello(say_bye):
    hello()

def hello():
    print("hello there")

def bye():
    print("bye")
>>> do_hello()
Hello There
>>> hello = bye  # set the name 'hello' to now refer to the code 'bye()'
>>> do_hello()  # 'hello' call inside do_hello() will call 'bye()' 
bye

Resolving names at run time provides even more flexibility than may be expected, but is needed in order to allow code to reference functions, variables and classes that will be declared later in the code than the use of the names.

Kotlin fun: declaration vs action

With a compiler, before any code is run, all code has been processed for the compiler to ‘compile’ information on all declarations. All names are linked to the relevant class, function or variable during compilation. Further, with classes function and variables declarations creating data within the compiler instead of in the run time environment, these names do not have assocated variables. So with

fun do_hello(){
   hello()
}
fun hello(){
   println("hello there")
}

in Kotlin, the function ‘do_hello‘ cannot be written unless the ‘hello‘ is declared elsewhere in the code. That elsewhere can be before or after in Kotlin, but it must be declared. Also, no variables are created for ‘do_hello‘ or ‘hello‘ within the run time environment as happens Python, so the call to ‘hello‘ cannot be changed within the program. To act as the Python code above, the program would need to be:

fun do_hello(){
   hello()
}
fun main(){
    do_hello()
    hello = { println("bye")}
    do_hello()
}
var hello = {
   println("hello there")
}

Classes: Interpret = dynamic, compile = Static

Consider the following class block code in python:

class FooBar:
   count = 0
   def __init__(self, a):
       self.a = a
   print("FooBar")

Everything works the same as it would outside of a class, only instead of new entries being created in ‘globals()‘, they become attributes of the ‘FooBar‘ object. Both the statements for ‘'count‘ and ‘__init__‘ generate new values, so they are added to the ‘FooBar‘ object, while the print statement (or technically expression), generates no value, so behaves exactly as a print in the global context would. In Python, the assignments within a class, are just as in the global scope. In fact, classes are mostly all ‘def’ statements, with the rare class level assignment, as given that all statements are run during the class declaration, having side effects such as print or other functions produces unexpected behaviour. Each line is executed as a step, adding more information to the class object, which can then be further modified at a later time as outlined in Static vs Dynamic types.

In Kotlin, as a compiled language there is no need for a ‘type’ object to hold the properties of the class, as they are held in an internal definition in the compiler. Kotlin has added the concept of the ‘companion‘ object which provides for holding values common to all members of the class mirroring the ‘type‘ object of interpreted languages, but this is no an exact equivalent as it does not take the role of holding all data related to the class as is the case with interpreted languages.

As stated, with a compiled program is like having an entire book translated (compiled) before it will be read (run), but analagies are never perfect, and this analagy ignores the need to for testing. Programs actually consists of separate parts, compiled independantly and then linked together in order to be run.

OK, so writing a compiled program is like writing a book, that will also contain appendicies (dependencies) written previously and potentially by other people, and having the book translated (compiled) before it is reviewed by the publisher(tested). The publisher may require alterations, which will again need to be translated. If the book is divided into chapters, then only changed chapters would need retranslation. Sometimes a change in one chapter will mean that another chapter based on that chapter will need re-translation. After all translations are complete, there is the task of putting together the book. What if chapters otherwise unchanged refer to “the diagram on the second page of chapter 3” but chapter 3 is now changed and the diagram is on page 4 of that chapter now? All parts must be put together as a complete work, and this the equivalent of linking. Back to programming, linking is ensuring that everying needed for the program is available so the program can run, and all internal references within the program are correct and up to date.

Role Reversal

Compiling Python

With .pyc files created with every import, surely python is actually often compiled?

Line Compilation vs Program Compilation: .pyc files

A compiler processes all code prior to any ‘run time’, while an interpreter processes the code interactively. However, this does not require the interpreter to process every character interactively. Python still waits until it has an entire statement before that statement is processed. This means that individual statements can be ‘compiled’. This means an entire ‘def‘ statement or ‘class‘ statement could be optimised. There are limits to the optimisations as with def statements the declaration of identifies may change before the code is run, and within class statements the class itself is still changing with each statement with the class block, but there are clear optimisations. Considering again the example:

def do_hello(say_bye):
    hello()

As the code referenced by ‘hello‘, could change prior to the code being run, or even between time the code runs, there is a need to check the ‘globals' every time this code is to be run. Internally, globals() is a dictionary or hashmap. Every time, the globals hashmap will need to be checked for the hash of “Hello”. Saving the ‘hash’ of “hello” instead of calculating the hash every time will considerably optimise code. In fact Python code is full of strings that can be hashed and saved.

There are many optimisations that can be saved as a .pyc file, saving a great deal of time at Python run time, while still keeping each line as separate dynamic code. This is the type of optimisation saved in a Python *.pyc file, and the process is quite different to compilation of the entire program, where all declarations would be in place before the code is run.

So *.pyc files are optimised, but do not represent a full compilation, and are still run by the Python interpreter, and still require the Python interpreter.

Just In Time Compilation.

Just in time compilation is another form of ‘line by line’ rather than entire program compilation. These are equivalent to *.pyc files, but with the optimisations created and saved just prior to running that statement.

RPython.

RPython, or ‘restricted python’ is genuine complete Python compilation. The language of RPython has to be changed from regular Python to enable genuine compilation, and RPython specifically states it is to be used only for the PyPy program. With only one program compiled, it is a very specialised project.

Interperting Kotlin.

The Kotlin REPL.

With Intellij, the menu command Tools/Kotlin/Kotlin REPL activates a Kotlin REPL (Read Evaluate Print Loop). This REPL does allow an evolving runtime environment as with Python, and while it can be very useful, it is not quite the same experience as delivered with Idle in Python, simply because the concept does work better with better with Interpreted languages designed for each statement to be run immediately. One of the first clues is the unlike with Idle, pressing [enter] does not run the code so far, that requires a special [control-Enter].

Kotlin Script

Watch this space. There definitely is a thing called Kotlin script, and although it does still have a ‘whole program processed before running’ approach, it does allow ‘scripting’ and is behind the move of gradle to using Kotlin. At this time, we have not found ways to use Kotlin script in a more general manner, but will update this page when we do.

Summary

Interpreters: ‘do’ now

  • actually run the code to be interpreted
  • each interpreter statement has immediate actions
  • each declaration creates an indentifier in the run time environment
  • the run time environment cannot be static as each declaration changes the environment
  • running interpreted languages is a single step process

Compilers: declare now, do later (2 steps!)

  • do not run the code, but produce a translation of the code
  • use the code to create declarations with the compiler
  • as program execution is a separate step, all declarations are finalised prior to running the program
  • all identifiers exist for the life of the context of the identifier
  • each environment (global or within a function) is static for the life of that environment
  • running compiled languages requires a 3 step process of compile, assemble/link then run

Footnote: Assemblers

Assembler: names for machine instructions and memory locations

In fact even when speaking of ‘machine code’ what is normally assumed is an assembler.  An ‘assembler’ is a program that reads a ‘source file’ and produces actual machine code, in format that can be read by a ‘loader’ which is program to load the machine code into the computer memory.

Assembler code gives the instructions names in place of numbers, and combines the ‘register number 7’ part into the ‘move from memory into register’ instruction to make a single number, as in machine code the register number is just part of the overall number or ‘instruction code’.  Memory locations are then given names.  The result is a slightly readable program, and using the assembler is what we actually mean when we write machine language, as no one works with the actual zeros and ones to write their program.

Advertisement

3 thoughts on “Languages: Compilers, Interpreters”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s