Building DSLs: Why, When & How?

As outlined on ‘what is a DSL’, both intent and implementation of DSLs vary considerably. The two types of internal DSL are most relevant to these pages, and how to implement an external DSL can be is ‘off topic’ for these pages, but the goals of external DSLs and a DSL-Full are the same, so the why is discussed.

Topics:

Why Create A DSL
- Why Extend an Existing Language?
- Why Create a New Language?
When
- When to extend a language? (almost always)
- When does an extension become a DSL? (Less often)
- When to create a Detached DSL (very rarely)
How to build a DSL?
- Names
- DSL building Tools

Why Create (A) DSL?

Semantically there is a difference between ‘domain specific language’ and ‘a domain specific language’. Domain specific language may be just new words, but ‘a domain specific language’ implies a complete new language.

Why Extend an Existing Language?

As discussed in language, new a language becomes extended in order to allow more precise and concise expression of ideas.

Having a new word which replaces a list of phrases is the human language of, for example, a function that contains several statements. With human language, if the concept conveyed by the new word is used rarely, then it may be better state the list of phrases than use the new word which few people will bother to learn. In the computing example, the function is not really new language if people using the function have no reason to remember exactly what the function does. A function simply used once is for that reason not really new language. The function would need to be used often, or describe a concept that itself is a building block to another concept, so the function allows understanding an even more complex function. With either of these tests, there is reason for functions or other language building blocks to be considered new language.

Why Create a new language (Kotlin DSLs).

The previous section described why create new language which can extend an existing language with new vocabulary, but creating a new language is a much bigger step, and requires further justification, especially considering that learning a new language is a far greater barrier than learning extensions to a language already known.

There are two possible reasons:

The may be no common language starting point for those using the new language, specifically those learning the resulting detached dsl should not need to know the host language used to build the detached dsl.
The detached dsl (or semi detached DSL) may be created to allow expression in a form already known by users of the DSL. For example, xhtml is designed to reflect html and allow representing html data structures within kotlin, and allow those data structures to look more familiar to people who already know html (which is itself a language)

If moving to Kotlin from Python, there are new ways to create a detached DSL, and the temptation is to overuse this new capability. It can be important to consider the real reasons for creating a detached DSL.

When?

Can a program ever avoid extending a language?

DSL methodology suggests that almost all programs define new language, at least for the domain of that program. This is a parallel to the way any novel defines some specific language, even if it is only the names of the central characters. In neither of these two cases is it actually usually thought of as ‘new language’, but rather, simply the way language is always being extended.

In a novel, the equivalent to a DSL typically only results if the novel is written for a specific domain, such as with science fiction or fantasy novels where the ‘universe’ described is in some way different from a universe already familiar to the reader. A new ‘universe’ is the building of a setting for the novel, and that setting could equally be used for other novels.

Almost every program will deal with some concepts not already familiar to a given reader, but while this can be minimised by keeping to language features and packages already widely known, there will still be variables and functions which will need to be defined. However, even thought variables are defined, well chosen names can result in no further definition being needed. For example, the variable ‘i’ when used as a loop iterator.

A program can keep new language to a minimum, but almost every program will create some new language, which means two important guidelines should be followed

craft definitions so that they create new definitions the work linguistically
consider when to move new language to a separate DSL

When does ‘additional language’ become a DSL?

The formulae for creating a DSL

The quick answer would be that anytime code is moved from the main application to a separate package it could be considered a DSL. However, simply moving code to a separate package does not ensure a DSL that is actually workable as a DSL, or could be accurately described as a DSL.

There are two additional requirements of the code moved into a separate package:

the code must be sufficiently independent of the ‘donor’ application for fully by other independent applications and can be documented and understood without the need to understand the original application
The code moved must be cohesive, and not simply a set of independent utilities. Although a set of independent utilities may be shared by several applications, if such utilities are entirely general purpose in nature then they do not constitute ‘domain specific language’. Such utility libraries can still be useful, but being general in nature require wider uptake as general language enhancements to gain acceptance across the wider use case.

Metaprogramming and Other Magic.

It is common to think of Metaprogramming as a clear indicator of a package being best described as a DSL. Metaprogramming (and other DSL tools) can be like super powers that give a language extension supernatural abilities. The Python Briefly DSL package specially mentions metaprogramming in the first few lines of the description. The definition of Metaprogramming is where programs manipulate code as data. Since the code can be manipulated as data, this allows changing what is meant by any given code, which allows that code to take on new meaning.

Metaprogramming allows code to differ from or become detached from the host language. The reality is that meta programming allows creation of detached or semi-detached DSL features, but such features will most often come at the expense of the DSL being more difficult to learn. Caution is needed when using features that can appear magical, because although magic is fun, part of the fun can be that it is hard to understand. Metaprogramming can make the DSL code mimic another external DSL that people already know (as with kotlinx.html) or can allow complex actions to be in the background as with kivy properties, or python dataclasses, but care is needed to ensure these features are productive, rather than just fun at the expense of complexity. Metaprogramming or treating code as data, does not itself make a DSL. Languages such as LISP always treat code as data, so is every LISP program an implementation of a DSL?

It is the new capabilities added that make a DSL useful, not how magical it appears.

Don’t almost all programs build a DSL? No.

Even hello world? (clearly a stretch)

You could really stretch definitions and argue that even hello world builds new vocabulary. In the case of “hello world” the only new vocabulary created is available in a shell. If the program is called “hello”, then the computer gains new syntax in ‘the shell’ such that typing ‘hello‘ now does something, and prints “Hello world”. The shell gains “hello” as one new word of vocabulary. This is not a DSL as the new vocabulary is not available at development time.

The reality is the functionality provided by ‘main’ is available only to the Operating System, but every other function, or class is a new building block available to main.

The Real DSL Test: Code with Classes or functions beyond main.

For any program with functions or classes outside of main, main can make use of those functions or classes as building blocks for the code in main. The rest of the program defines at least some specific language or ‘mini-DSL’ that gives main an extended language to work with.

Yes, main has extended language to work with, and good program design requires that extended language is well designed, but that does not make a DSL unless the extended language can be used in a domain that is broader than a single application.

How to Build a DSL?

Names

Most new vocabulary is simply names. While some novels have far more extended language, almost any work of fiction will need to introduce names for characters that will come to have more meaning as more details become associated with those names. In some novels the names can seem to similar and it is harder to remember who is who than with other novels.

There is an art to choosing good names, and naming things is hard.

The simple art of choosing good names is central to building domain specific language as much of any new language will be names.

DSL building tools

standard tools

Every language has declarations of variables and functions, and classes or types. These allow adding verbs and nouns.

Metaprogramming

This allows writing code that itself writes or modifies other code. This allows code to act as a modifier of other code, in the manner of adjectives/adverbs in conventional language.

Others?

Various languages offer specific syntax such as decorators or operator overloading that are specifically designed for language extension.

Conclusions

Almost every project has Domain Specific Language

Every program that contains functions and classes, is building some DSL for use by the main function. So every program builds a ‘partial-DSL’, which generally only becomes a ‘true DSL’, if the functions and classes are separated into a separate package, are reusable , and defines functionality is independent of any specific application, and provides toolkit functionality which can extend language capabilities.

Building DSLs: part of the fabric of programming

Defining variables, functions, and classes is major part of programming. Even the algorithms, are then wrapped in functions, and then become part of the vocabulary. Almost every part of coding is building new vocabulary, and building vocabulary is a core part of extending a language, regardless whether the resulting language then becomes a DSL. The only question is ‘do the language extensions provided by this program become an external DSL – or more like a ‘glossary’ for the one project.