Language: The Core of DSL design

For both standalone DSLs and language extensions, to build a DSL is to build new language. Being restricted to specific domain should allow the language to be small and simple, but it is still building new language.  How do we keep new language intuitive and simple? This page looks at the basics for guidance.

  • What is ‘language’ anyway?
    • vocabulary
    • syntax
  • Language Evolution
    • Computer Language
    • Human Language
  • DLS – Standard Language is only the Starting Point
    • Human Language
      • Jargon
      • Jargon Like Language in Non-Technical Domains
      • Language Augmentation for Every Domain
  • How Do We Learn Languages?
  • Why does language evolve, why do jargons emerge?
  • Conclusion

What is Language anyway?

Vocabulary.

The core of language is vocabulary.  Words provide very concise representation of concept or object.  For example,  the word ‘tree’ represents to most people somewhere between the dictionary definition of ‘tree’ and the Wikipedia entry for ‘tree‘.  In fact there are many different meanings of tree depending on context or ‘domain’ of the discussion.  Without a specific context, we would normally assume the  botanical tree,  and just in that one case the between dictionary and Wikipedia, there is so much information that a person may associate with the word tree, that they could spends pages of words explaining all that is communicated by just one word.

Syntax.

Each language also has a syntax, or set of rules for how words are combined to represent the interactions of the concepts referred to by vocabulary.  While the vocabulary is highly dynamic, constantly changing, syntax tends to be far more stable for any given language.

Language Evolution and Continual Extension.

Computer Languages

Consider the C language.  Different people took the language along different paths, such that it was seen as necessary to create a definition of ‘Standard C’ know as Ansi C, with that standard being agreed in 1989.

The large number of extensions and lack of agreement on a standard library, together with the language popularity and the fact that not even the Unix compilers precisely implemented the K&R specification, led to the necessity of standardization. (Quoting Wikipedia)

But people keep not only making their own extensions, they also want some components of these extensions to become part of the language itself, leading to standards updates in 1999 and again in 2011,then there have been further standards in 1999 and 2011.  Many things added in a new standard are already widely in use prior to adoption in the standard. Note there are proposals for a new standard for around 2021.

Every language keeps being extended.

Human Language.

It becomes clear from reading Shakespeare than human languages also evolve.  They also diverge with English being in the form of English(UK), English(US), English (Australia) etc., with each of these adding new words every year. Common practice is for each language has its own authority: e..g English(UK) has the Oxford Dictionary as the official reference.  In a sense, computer languages with their standards bodies are following the same process as human languages. But human language also evolves in advance of standards as words are only added to references like the Oxford Dictionary once they are already in use. Further, not all words are in the dictionary, which is one of the reasons there have been for some time encyclopedias, which can contain words such as company names that may be commonly used, but are from categories that are not part of the official language.  In fact references like Wikipedia, become a common source of ‘what is widely understood’ even though there could be inaccuracies.

Human language, not just computer languages, always keep being extended.  Again there are ‘standards’, but language extension happens naturally and in advance of  standards.

DSLs – The standard is only the base

Human Language

Jargon

To again quote Wikipedia: Jargon is a type of language that is used in a particular context and may not be well understood outside that context.

The word  ‘context‘ can be considered a synonym for ‘domain’ as used in ‘Domain Specific Language‘ (DSL).   The reality is almost all jargon, as in the many examples,  provides vocabulary which extends a language, but that vocabulary extension is insufficient for a complete conversation by itself.  That is, most communication that makes use of jargon, still will take place mostly in a general language such as English with the jargon providing extra words and extending the language.  In this way almost all jargon can be considered as an Augmentation DSL. An exception would be maths ‘jargon’ which is not based on extension of English or another general language, with Maths, at least maths in written form, even using a different set of symbols than general languages. This clearly puts maths in the External DSL category.

Jargon Like Language in Non-technical Domains.

Is jargon always technical? Is the phrase ‘technical jargon’ an example of a tautology? There was certainly a heavy bias to the Wikipedia list of examples towards technical fields, but there is the example of the sport of cricket. Is the jargon all related to the technical aspect of cricket?

Now consider Star Wars.  Luke and Leia become words that take on a special meaning, but so do wookie, porg, Tatooine, Jedi, Sith and ‘The Force’. I feel there are as word specific to the Star Wars domain as with most of the jargon examples.

In fact any novel creates it own specific vocabulary, although it the simplest cases that vocabulary may be little more than names, the fact still remains you have define some vocabulary for any literary work.  Depending on the novel, significant amounts can be specific to the novel. Consider Lord of the Rings. Not only are characters explored, but also types of creatures, new locations and imaginary world. It can be described as a “Lord of the Rings Universe” being created.

The new vocabulary may be far short of the vocabulary of Star Wars or Lord of The Rings, and hardly a ‘jargon’ or ‘language augmentation’. but the process of introducing new words that take on meaning throughout a novel,  is the same regardless of how many or how few words are introduced.  The structure for introducing new words is part of language. Every work introduces words that when read, convey meaning not known at the outset.

Language Augmentation for Every Domain.

Beyond novels, even a household has its own vocabulary including words with meaning beyond that which would be known by any outsider.  Naturally, the names require introduction, but simple things like the rooms have labels that are associated to the family with where the room is and what is in the room and even who normally uses that room at what time. An outsider, without the domain specific extra meaning associated  to the word ‘bathroom’, a visitor may need directions. knowledge.  The domain specific knowledge and meaning that becomes added to words enables concise conversation in a way that would be tedious if every word had to be explained every time.

It seems every domain has its own words. Some words, such as names like ‘Bob’ are generally known outside the domain to be a name, but within a specific domain take on a specific meaning ‘Bob’ becomes not just a name, but a person, with a complete set of associations about that person.  Many words simply take on extra more defined meaning within a domain, while other words can be unique to the domain.

How Do We Learn Languages?

A significant part of the human brain has evolved specifically to process language.  While it is clear from the above reflection that everyone keeps learning language extensions throughout our life, relatively few of us learn new languages once we become adults.

The message is “language extension easy, new language much harder”.

When a new concept arises, we can start by describing it in full using a combination of existing words, much like quoting a dictionary definition. Adopting a single word to replace repeating the description over and over is very natural.

It is also observerd that human languages exist in related families, and learning other languages from the same family is easier than learning from new language families.  So if you know Spanish, Learning Italian may be easier than learning Hungarian.  Many computer language families are also related making moving from computer language to language easier.  If it is necessary to build and external or detached DSL, can it be a simpler language to learn if it is related to an existing and well known DSL?

Similarly, we can learn a new word from the dictionary or encyclopedia provided we can already understand the language used in the dictionary definition or encyclopedia entry.  Extending computer languages is the similarly, build a definition of a new language element using existing language, is familiar, and that definition can easily be found to learn about the new language element.  It is when we find completely new language, that understanding enough to even find the definition becomes difficult.

Language exists to enable communication. New language arises because it improves communication.  You can substitute any word in English with the meaning from the dictionary, but that is replacing a single word with one or more phrases, and sometimes losing some precision of exactly what is meant. The invention of a new word can do the opposite, it can allow reducing one or more phrases to a single word, and conveying even more precise meaning by doing so.

That is why new words, and new languages emerge.  Because creating new single word can allow conveying meaning that would otherwise require several phrases, and the single word can even convey meaning better than several phrases.  But there is a trade off – people have to learn the new word.

Jargon arises because some new words are useful only in one specific domain, so for many people outside that specific domain there is insufficient benefit to learning those words.

The benefit to new words and additional language is the ability to communicate more precisely and concisely.  The new language ‘works’ when the benefit outweighs the effort of learning that additional language.

Conclusion.

We are on a  Journey towards computers with ‘natural’ language. Since the first move from machine code to assembler, the goal has been for computer programs also to be processed by humans.  Computer language continues to evolve, to express concepts specific to computing and allow humans to interact with computer systems.

Extending computer languages in ways that mirror how we extend human languages, is in practice leveraging how the human brain processes language.

Specifically, the following points stand out from considering  human languages:

  • New words and new language allows more precise and concise expression, at the cost of learning the additional language or words
  • Defining DSLs or computer ‘jargons’ language extensions for repeated use across a domain
  • Given that extending language just for a single work of fiction is a well proven model,  a single program extending language should also be highly practical.
  • External DSLs and Detached DSLs that do not extend an existing language should be used very carefully considering the difficulty of learning completely new human languages

1 thought on “Language: The Core of DSL design”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s