DSL Methodology: A key software concept.

With Kotlin the term ‘DSL’ has taken on a specific meaning, and that more specific meaning is explored in another page on Kotlin DSLs like kotlinx.html and how to write them.  DSL methodology is a key reason that the capabilities of a language to write domain specific extensions to the language becomes important.

This page concentrates on the concept of DSL Methodology.  DSL methodology is to consider software development as the task of creating the component tools that allow the expression of what the program does within a single concise function, and in a manner the not only executes correctly but is also easy for a person to read, understand and when necessary, modify.

  • Introduction to DSLs
    • DLS: the general DSL definition
    • DSL: independent language or and language extension?
    • When does a program create a DSL?
  • Human Language and DSLs
    • Why consider human languages?
    • The Dictionary is an insufficient reference.
    • Jargon as a form of DSL
    • Situation Specific Language extensions
    • Specific Human language: Independent Language vs Extensions.
    • Conclusion
  • DSL Methodology: The Basics
    • Simple language extension
    • DSL Methodology: Building Blocks
    • More Layers
    • The Core concept
    • When to apply DSL Methodology
  • Implementing DSL Methodology
    • The Core concept
    • software modules: group extensions together
    • Leveraging existing Language Extensions
  • Conclusion

Introduction

DSL: The general definition

A DSL is an acronym for Domain Specific Language.  A ‘Domain’ being effectively a specific application. Contrasting with ‘General purpose languages’ which attempt to allow for solving any programming problem, a DSL can be purpose designed for a specific ‘domain’ or a specific type of problem.

DSLs can sound like using Kotlin (or Python) to build an entirely new language, and this can seem true, but the reality is simpler. All kotlin DSLs are extensions of Kotlin, and although sometimes use cases can focus on the extension and make little use of the underlying language, that underlying language is always still there.

Two Types of DSL: Independent DSLs and Language (extensions) DSLs

Independent DSLs

There are DSLs like SQL, which are stand-alone languages to tackle a specific field or domain.  The first problem with independent DSLs is that a general purpose language is almost always also required.  This problem can be solved by combining a DSL with a general language, like using SQL from Python, although then you have two different languages, it does work.  The second problem with independent DSLs is that the features of the general purpose language are not accessible from within the DSL, so the DSL has to duplicate features of general purpose languages, and the duplicated features are generally inferior to those in general purpose languages.  E.g. numeric expressions in SQL are not as powerful as most general purpose languages, and there is often a syntax change from the general purpose language.

Language (extended) DSLs.

There is also a solution where a DSL is an extension to a general purpose language. For example, in place of developing in SQL with Python, SQLAlchemy brings the power of the SQL language to Python.

When we use the term ‘Kotlin DSL’ or even ‘Python DSL’, we mean a DSL made by extending Kotlin or Python with extra ‘vocabulary’ for domain specific features.  The DSL is a set of new language which extends an existing language.  This type of DSL is seen as the preferred solution. In programming, with an extension, there is less syntax to learn and greater consistency if, in place of a new independent language, extra ‘vocabulary’ for a language we already know can be used.

When does a program create a DSL?

When does adding new features to an existing language, create a new DSL? As with many things, the extremes are evident. The “hello world” program would be regarded not to create a DSL, and at the other extreme, some packages clearly implement a DSL. However, if you had half the features, would it still constitute a DSL? At what point does adding additional ‘vocabulary’ reach the definition of being a DSL?

The reality is that every program, including “hello world”, creates some new vocabulary somewhere, just in the case of “hello world” that vocabulary is not very useful. However, if the program is called “hello”, then the computer gains new syntax in ‘the shell’ such that typing ‘hello‘ now does something, and prints “Hello world”. The shell gains “hello” as one new word of vocabulary. While one word is not much of a language extension, the concept of extending a language is common throughout programming, so the principle of building a DSL always applies at some level.

Human language and DSLs

Why consider human languages?

A significant part of the human brain has evolved specifically to process language.  Since the first move from machine code to assembler, the goal has been for computer programs also to be processed by humans.  Just how do our brains handle domain specific language?  Don’t we take years to learn one single language and find learning another quite difficult?

Jargon as a DSL equivalent

One spoken language parallel to a computer extended DSL is ‘jargon’. Many ‘domains’ evolve their jargon or extensions to the language. Jargon more concisely and more specifically communicates the concepts needed in specific a domain than regular ‘non-jargon’ language does, but is generally used in combination with an underlying general-purpose language, such as English or French or Chinese. People who already speak a general purpose language can learn one or more jargon vocabularies in a much shorter time than learning the general-purpose language.

The dictionary is an insufficient model.

If you are reading this document, it can be assumed you can read English, and the reference for English is the dictionary.  But there may be words even on this page, where the dictionary is actually not that helpful, because the dictionary does cover what words part of the language, but not how they combine to provide meaning, or even a full understanding of the concepts behind the meaning.  Wikipedia can be a useful source of far more information behind the words, but there are also words that change with context.  All of this means the language becomes more like the syntax for expressing meaning, but it can take a lot of language to actually convey meaning.  Just as Python or Kotlin have a language syntax, but language extensions are built within that base syntax.  To understand the meaning of what is written, terms can become familiar to us, but until there are familiar we may have to go to the dictionary, the encyclopedia, or for context specific things like where something is, we may have to ask.  All of this is the paralleled in code but reading the definition of an object or function, but once we are familiar we should not need keep referring to that source.

Situation Specific Language extensions.

However humans also learn far more localised and situation specific language extensions.
Consider a random page from a novel. Most of the words can be found in the appropriate language dictionary, or an encyclopedia, because the are part of the general language (e.g. English). But there are words that are not sufficiently explained by either dictionary or encyclopedia, because they have  a specific meaning in the context of the novel, and the meanings are explained through the novel.  Names are one class of such words. Names can make reading a page at random a challenge. Read a random page from a novel and we skip the explanation of the specific meaning.  Just who is ‘Harry’ or ‘Sally’? Have they met?  What is their relation to the protagonist? Novels are designed to be read sequentially, so there is no index to easily find what has been defined, and usually no clear list of what is defined.  Depending on the novel, significant amounts can be specific to the novel.  Consider Lord of the Rings.  Not only are characters explored, but also types of creatures, new locations and imaginary world.  It can be described as a “Lord of the Rings Universe” being created.

So even to navigate an individual literary work, a new extended vocabulary can be required, varying from knowledge of just a few character names through to an entire altered universe.

Specific Human language: Independent Language vs Extension.

While jargon can seem impossible to understand for a ‘layman’ who only speaks regular ‘non-jargon’ language. The reality is, jargon normally does not create a replacement for regular language and communication in jargon alone is rarely sufficient. Even very domain specific communication requires a mixture of regular language and jargon together.

Imagine, for example, a French person and a Chinese person who are both from the same industry and use the same jargon, but other than that jargon are unable to communicate. Even adding words like ‘very’ to the jargon would be a problem. Like independent DSLs, jargon needs to coexist with a general purpose language.
Just like the French and Chinese colleagues, every independent DSL also needs some ‘normal words’ so they add their own limited set of ‘normal words’. This means Independent DSLs have to revisit many things already present in general programming languages, and the result is still restrictive.
The French and Chinese hypothetical colleagues would be far better placed if they both spoke a common regular language in addition to just the jargon.

Conclusion

Human communication in human language actually relies on language extension using a base set of rules.  This suggests that our thinking should be well adapted to the same approach within programs.

DSL Methodology: The basics

Simple Language extension.

Language extension is the core programming concept of defining things.  Even a variable definition is defining a new language element. For more significant language extensions in blocks, in python we have ‘import’ and to give increased scope of what can be imported there is ‘pip install’.

DSL methodology: building blocks

It is generally agreed that there is a maximum number of lines for a well defined function. Opinions on the actual limit vary,  but generally the recommendation range from that which can be seen on the screen at one time, through to as high as around 100 lines.

Now consider that, every program is described by a single function, usually called ‘main‘.  With a very simple program, all the code could be held in main, but as the code grows, that limit of  around 25 to 100 lines in one function will become a restraint.

How to describe the program in the main function, and keep main small enough to read and understand?

As the program grows, the developer can move some code to ‘other functions’ and in main simply call these functions containing the moved code This is one way the size of main can be controlled.

But simply moving ‘chunks’ of main into functions is not DSL methodology. DSL methodology is to create buildings blocks functions that allow writing the logic of main in a more concise way.  The logic of main stays in main, it is only logic to turn steps into extensions of the language that is moved to the functions.

Main stays readable if the concepts of the functions are clear.  Usually any  functions with ‘moved code’ will need to be generalised to convert them into building blocks, and the individual application specific nature come from parameters specified when those blocks are called. Then understanding what the program does can still be clear just by considering that main function.  A new developer may not need to read beyond main to learn, to understand what the program does, and may limit going beyond main to an area of functionality of particular interest, and infer the meaning of other new ‘vocabulary’.

More Levels

The concept is that main describes the program at the top level, but main will need the use of either program specific building blocks or ‘extended language’ and/or  language building blocks, know as packages, which are common to several applications.  In the case of “hello world”, the only other function is the print function, and that function is considered part of the language.  If you know the python language, you know how the python print works.  However if the other functions  beyond main are the ‘moved code’ described above, then these other functions will most likely be unique.  They are an extension to the language for the use of this main function, and unique to this main function.   While the function names can convey what is done at a high level, to know exactly what these functions do, a person reading the program will need to go and then read these functions.  To read main, this is the equivalent to looking up a word in the dictionary.

As the solution grows in detail, in turn these functions become complex and will also require their own extensions to stay within size constraints. As the program system grows the number of levels of extension to the original language grows.

The Core concept.

The core concept, is that the end result is the main is written using an extended language, and those extensions are build on other extensions. Each level  should be readable without looking up what each component of the new language means in full detail.   Each level is written in terms of an underlying extended language, and the program should be broken up into components that define new language blocks or ‘jargons’, which are separate from layers build using those jargon language blocks.

The role of each level or block is to provide the language extensions to make the level above simple to understand. The lowest level of the program is the only level of the program written in python or kotlin or whatever itself, as all other levels are built on the extensions.

A web server will rarely be built in ‘raw’ python alone, but will normally be built on a software stack of template engines, routing engines, database engines etc.  The language of the project becomes not just python, but python plus all those extensions.  Then the project may add its own extensions.

But to work in the project, you have to learn the language of each of the extensions, or at least the language of the extensions being used in the area of the project you are working.

Every project of any scale is not just built on a language, but on the language plus the set of extensions to that language.

When to apply DSL methodology?

It follows from the concept of ‘the main function describes the program’ that a very simple program such as hello world already achieves the goal of main being conveying what the program does. In fact any project simple enough to be contained in a single file, or  unlikely to require changes beyond a month from when the program is written, is too small scale or time frame to benefit significantly from DSL methodology,

The main relevance of DSL methodology is for long term projects with continued updates, developed by a team producing several releases over a time scale of more than one year.

It is with this type of software that reading what the code does can become a challenge even to the author of that block of code over time.

Implementing DSL methodology

the goals

DSL methodology is simply a slightly different way to view the normal principles of sound software development. The steps to implementing are all striving to achieve these goals.

The goals DSL methodology are:

  • allow each part of the system to be expressed in the simplest language possible and with the smallest possible language extension
  • keep system specific functionality at the highest layer possible, and avoid buried functionality
  • building blocks should be as generic as possible and able to be understood without considering the overall application

software modules:  group extensions together

Building blocks should where possible be grouped in to logical modules that together provide a specific type of new functionality.  In fact these modules should be considered to have functionality independent of the central application, and be able to have their own documentation, and perhaps own repository.

In fact, given that such extensions should not contain the logic of the main application, it may be possible to open source these extensions, even where the main application itself would not be open source.

Leveraging existing Language Extensions

Most often best described as packages, there are readymade language extensions which can give great capabilities.  Selection of these packages becomes very important as each selection becomes additional ‘extended language’ that the team must become familiar with.

The real world is, that beyond well established quite generic packages, there are many packages that takes steps towards keeping your code simple, but in the end in your usage still leave a program to big to maintain.  There choices then are:

  • extend an existing package
  • build a new package that uses the existing package internally but exposes only a new api
  • use the existing package as inspiration and effectively fork
  • build an alternative package

Conclusion

The end goal is to have a set of smaller packages, some internal modules to the application, some as separate packages perhaps even with their own lifecycle, together with the smallest possible core application.

Advertisements

3 thoughts on “DSL Methodology: A key software concept.”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s