Haoyi's Programming Blog

Table of Contents

Build your own Programming Language with Scala

Posted 2019-09-11
Easy Parallel Programming with Scala FuturesSimple Web and Api Servers with Scala

One strength of Scala is implementing programming languages. Even if your goal is not to implement an entirely new programming language, these techniques are still useful: for writing linters, program analyzers, query engines, and other such tools. This blog post will walk you through the process of implementing a simple programming language in Scala, introduce some of the fundamental concepts involved, and finishing with a working interpreter for a simple programming language.


The goal of this exercise will to implement a subset of the Jsonnet programming language:

input.jsonnet

local greeting = "Hello ";
local person = function(name){
  "name": name,
  "welcome": greeting + name + "!"
};
{
  "person1": person("Alice"),
  "person2": person("Bob"),
  "person3": person("Charlie")
}

output.json

{
  "person1": {
    "name": "Alice",
    "welcome": "Hello Alice!"
  },
  "person2": {
    "name": "Bob",
    "welcome": "Hello Bob!"
  },
  "person3": {
    "name": "Charlie",
    "welcome": "Hello Charlie!"
  }
}

Jsonnet is a simple language meant to construct JSON configuration files: the output of evaluating a .jsonnet file is a single JSON structure containing dictionaries, lists, strings, numbers, and booleans. The output JSON can then be used to configure Kubernetes, CloudFormation, Terraform, or other software systems. Jsonnet is used heavily at Google, Databricks, and other companies in order to manage their large and complex system configurations.

Our interpreter will roughly follow the following steps:

This tutorial will walk you through all three phases of implementing our simple interpreter, giving you an intuition for the techniques, data structures and algorithms involved. While not a comprehensive or production-ready implementation, this tutorial should hopefully give you enough foundation to get started with your own simple language-related projects.

For this exercise, we will do all our work inside the Ammonite Scala REPL. Ammonite can be installed via:

$ sudo sh -c '(echo "#!/usr/bin/env sh" && curl -L https://github.com/lihaoyi/Ammonite/releases/download/1.6.8/2.13-1.6.8) > /usr/local/bin/amm && chmod +x /usr/local/bin/amm' && amm

Loading...
Welcome to the Ammonite Repl 1.6.8
(Scala 2.13.0 Java 11.0.2)
If you like Ammonite, please support our development at www.patreon.com/lihaoyi
@

Once the prompt is open, we are ready to begin.

The Jsonnet Language

local greeting = "Hello ";
local person = function(name){
  "name": name,
  "welcome": greeting + name + "!"
};
{
  "person1": person("Alice"),
  "person2": person("Bob"),
  "person3": person("Charlie")
}

Jsonnet itself is similar to JSON, but introduces the constructs that help you tidy up verbose or repetitive portions of your JSON config: local variables like greeting above, function definitions like person, and basic operations like concatenating strings with +. For the purpose of this exercise, we will stop at implementing these three simple features. Further features and improvements can be implemented the same way, and we will leave the task of exhaustively implementing the entire Jsonnet language to production intepreters like Sjsonnet.

Primitives

Jsonnet has a similar set of primitives as JSON. For this tutorial, we will consider just a subset of them:

"hello world" // strings
{"key": "value", "thing": "123"} // dictionaries

For now, we will assume strings do not contain escape sequences like \n or \", and leave out additional data types like numbers, booleans, arrays, or null.

We will also assume strings only support a single operation: concatenation via +.

"hello" + "world"
"helloworld"

Locals

You can define local variables using the local keyword, the assignment of an expression to a name, a semicolon ;, and then the final expression:

local greeting = "Hello ";
greeting + greeting
"Hello Hello "

You can define local functions by combining the local keyword with a function expression:

local hello = function(name) "Hello " + name;
hello("Bob")
"Hello Bob"

As in most programming languages, functions are called using parentheses containing the arguments you wish to pass.

Local variables and functions can also return structured data in dictionaries:

local person = function(name) {
  "name": name,
  "welcome": "Hello " + name + '!',
};
person("Bob")
{
  "name": "Bob",
  "welcome": "Hello Bob!",
};

Composition

The language features described here can be combined in arbitrary ways, e.g. a function can be called inside a dictionary value:

local f = function(x) "Hello " + x;
{"key": "value", "thing": f("World")}
{
  "key": "value",
  "thing": "Hello World"
}

And that function may itself return a dictionary, that gets nested within:

local f = function(x) {"nested key": "Hello " + x};
{"key": "value", "thing": f("World")}
{
  "key": "value",
  "thing": {
    "nested key": "Hello World"
  }
}

This quick tour is only a small subset of the full Jsonnet language, but will be enough for this tutorial.

Parsing Jsonnet

To parse Jsonnet, we will be using the Fastparse library introduced in this blog post:

Defining the Syntax Tree

Let's start by defining a syntax tree of what our minimal Jsonnet syntax looks like: strings, dictionaries, functions, and local definitions. We will do so using a Scala sealed trait and case classes:

sealed trait Expr
object Expr{
  case class Str(s: String) extends Expr
  case class Ident(name: String) extends Expr
  case class Plus(nodes: Seq[Expr]) extends Expr
  case class Dict(pairs: Map[String, Expr]) extends Expr
  case class Local(name: String, assigned: Expr, body: Expr) extends Expr
  case class Func(argNames: Seq[String], body: Expr) extends Expr
  case class Call(expr: Expr, args: Seq[Expr]) extends Expr
}

(To enter this code into the Ammonite REPL, wrap it in a pair of curlies {...} so it is entered as a single unit)

Here, the Expr data structure is meant to represent the meaningful parts of the Jsonnet syntax. Here are some example snippets and what we expect them to parse to.

Simple strings:

"hello"
Str("hello")

Dictionaries:

{"hello": "world", "123": "456"}
Dict(Map("hello" -> Str("world"), "123" -> Str("456")))

Functions:

function(a, b) a + " " + b 
Func(Seq("a", "b"), Plus(Seq(Ident("a"), Str(" "), Ident("b"))))

Local variable declarations:

local f = function(a) a + "1";
f("123")
Local("f", Func(Seq("a"), Plus(Seq(Ident("a"), Str("1"))), Call(Ident("f"), Str(123))

And of course, we expect to be able to parse any combination of these language features combined together.

Parsing Terminals

To begin using Fastparse, we will use the following imports:

@ import fastparse._, MultiLineWhitespace._
import fastparse._, MultiLineWhitespace._

First, let's write the parser for Str. We will ignore escape characters for simplicity, meaning a string is simply a ", followed by zero-or-more non-" characters, closed by another ":

@ def str[_: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" ).map(Expr.Str)
defined function str

@ fastparse.parse("\"hello\"", str(_))
res10: Parsed[Str] = Success(Str("hello"), 7)

@ fastparse.parse("\"hello world\"", str(_))
res11: Parsed[Str] = Success(Str("hello world"), 13)

@ fastparse.parse("\"\"", str(_))
res12: Parsed[Str] = Success(Str(""), 2)

@ println(fastparse.parse("123", str(_)))
Parsed.Failure(Position 1:1, found "123")

Note how we use the ~~/ operator after the "\"" open quote: the ~~ means we do not want to consume whitespace here (since we are inside a string) and the / is a Fastparse Cut meaning we want to avoid backtracking if the parse fails. A detailed discussion of what Cuts give us is left to the linked documentation.

Identifiers are similar: an alphabet or underscore, followed by zero or more alphanumeric characters.

@ def ident[_: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!.map(Expr.Ident)
defined function ident

@ fastparse.parse("hello", ident(_))
res17: Parsed[Ident] = Success(Ident("hello"), 5)

@ println(fastparse.parse("123", ident(_)))
Parsed.Failure(Position 1:1, found "123")

Parsing Recursive Nodes

The next syntax tree node we will look into parsing is Expr.Plus, used to model the a + b syntax.

The Expr.Plus nodes representing +, and all other case classes in our syntax tree, are a bit more complex: they have a recursive definition where Plus is an Expr, but an Expr could be a Plus node. This can be solved by making our parsers recursive, as follows:

@ {
  def expr[_: P]: P[Expr] = P( prefixExpr ~ plus.rep ).map{
    case (e, Nil) => e
    case (e, items) => Expr.Plus(e +: items)
  }
  def prefixExpr[_: P] = P( str | ident )

  def str[_: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" ).map(Expr.Str)
  def ident[_: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!.map(Expr.Ident)
  def plus[_: P] = P( "+" ~ prefixExpr )
  }
  
@ fastparse.parse("a + b", expr(_))
res65: Parsed[Expr] = Success(Plus(List(Ident("a"), Ident("b"))), 5)

@ fastparse.parse("a + b + c", expr(_))
res66: Parsed[Expr] = Success(Plus(List(Ident("a"), Ident("b"), Ident("c"))), 9)

@ fastparse.parse("a + \" \" + c", expr(_))
res67: Parsed[Expr] = Success(Plus(List(Ident("a"), Str(" "), Ident("c"))), 11)

Note that we cannot simply define plus as expr ~ "+" ~ expr; this is because the plus parser would then be Left Recursive, causing an infinite recursion at parse time. Instead, we need to define plus as just the suffix "+" ~ prefixExpr, and have the expr parser do the work of repeating plus via ~ plus.rep and aggregating the results into a Expr.Plus node if non-empty.

Expr.Dict nodes are also recursive, each comma-separated key-value pair containing a Expr.Str key and a Expr value. We can parse those as follows:

@ {
  def expr[_: P]: P[Expr] = P( prefixExpr ~ plus.rep).map{
    case (e, Nil) => e
    case (e, items) => Expr.Plus(e +: items)
  }
  def prefixExpr[_: P] = P( str | ident | dict )

  def str[_: P] = P( str0 ).map(Expr.Str)
  def str0[_: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" )
  def ident[_: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!.map(Expr.Ident)
  def plus[_: P] = P( "+" ~ prefixExpr )
  def dict[_: P] = P( "{" ~/ (str0 ~ ":" ~/ expr).rep(0, ",") ~ "}" ).map(kvs => Expr.Dict(kvs.toMap))
  }

@ fastparse.parse("""{"a": "b", "cde": id}""", expr(_))
res84: Parsed[Expr] = Success(Dict(Map("a" -> Str("b"), "cde" -> Ident("id"))), 21)

@ fastparse.parse("""{"a": "b", "cde": id}""", expr(_))
res85: Parsed[Expr] = Success(Dict(Map("a" -> Str("b"), "cde" -> Ident("id"))), 21)

@ fastparse.parse("""{"a": "b", "cde": id, "nested": {}}""", expr(_))
res86: Parsed[Expr] = Success(
  Dict(Map("a" -> Str("b"), "cde" -> Ident("id"), "nested" -> Dict(Map()))),
  35
)

Note how we extracted the str0 parser from str: str0 returns the raw String that was parsed, while str wraps it in an Expr.Str syntax tree node. Since Expr.Dict keys are syntactically the same as Expr.Strs, but do not need to be wrapped in Expr.Str nodes, we can re-use the str0 parser in our dict parser to parse them as well.

Completing the Parser

Adding the parsers for func, local and call to this code, we get the following:

object Parser {
  def expr[_: P]: P[Expr] = P( prefixExpr ~ plus.rep ).map{
    case (e, Nil) => e
    case (e, items) => Expr.Plus(e +: items)
  }
  def prefixExpr[_: P]: P[Expr] = P( callExpr ~ call.rep ).map{ case (e, items) =>
    items.foldLeft(e)((f, args) => Expr.Call(f, args))
  }
  def callExpr[_: P] = P( str | dict | local | func | ident )

  def str[_: P] = P( str0 ).map(Expr.Str)
  def str0[_: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" )
  def ident[_: P] = P( ident0 ).map(Expr.Ident)
  def ident0[_: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!

  def dict[_: P] = P( "{" ~/ (str0 ~ ":" ~/ expr).rep(0, ",") ~ "}" ).map(kvs => Expr.Dict(kvs.toMap))
  def local[_: P] = P( "local" ~/ ident0 ~ "=" ~ expr ~ ";" ~ expr ).map(Expr.Local.tupled)
  def func[_: P] = P( "function" ~/ "(" ~ ident0.rep(0, ",") ~ ")" ~ expr ).map(Expr.Func.tupled)

  def plus[_: P] = P( "+" ~ prefixExpr )
  def call[_: P] = P( "(" ~/ expr.rep(0, ",") ~ ")" )
}

func and local are relatively straightforward: each one starts with a keyword, and they can be parsed recursively without issue. We have also split out ident0 from ident, since the func parser uses the same syntax as ident for parsing its parameter list but does not need the identifiers to be boxed into Expr.Ident syntax tree nodes.

Note that call also needs us to split up prefixExpr further into callExpr, since the a() call syntax would otherwise be left-recursive similar to the a + b plus syntax we saw earlier.

We can test the parser manually and see that it does what we want:

@ fastparse.parse("\"123\"", Parser.expr(_))
res63: Parsed[Expr] = Success(Str("123"), 5)

@ fastparse.parse("id", Parser.expr(_))
res64: Parsed[Expr] = Success(Ident("id"), 2)

@ fastparse.parse("a + b", Parser.expr(_))
res65: Parsed[Expr] = Success(Plus(Ident("a"), Ident("b")), 5)

@ fastparse.parse("a + b + c", Parser.expr(_))
res66: Parsed[Expr] = Success(Plus(Ident("a"), Plus(Ident("b"), Ident("c"))), 9)

@ fastparse.parse("""{"a": "A", "b": "bee"}""", Parser.expr(_))
res69: Parsed[Expr] = Success(Dict(Map("a" -> Str("A"), "b" -> Str("bee"))), 22)

@ fastparse.parse("""f()(a) + g(b, c)""", Parser.expr(_))
res95: Parsed[Expr] = Success(
  Plus(
    List(
      Call(Call(Ident("f"), List()), List(Ident("a"))),
      Call(Ident("g"), List(Ident("b"), Ident("c")))
    )
  ),
  16
)

The syntax of our programming language is recursive: local, function, plus, and dict expressions can contain other expressions, nested arbitrarily deeply. We can test this out by feeding such nested examples into the expr parser:

@ fastparse.parse(
    """local variable = "kay"; {"a": "A", "f": function(a) a + a, "nested": {"k": variable}}""",
    Parser.expr(_)
  )
res74: Parsed[Expr] = Success(
  Local(
    "variable",
    Str("kay"),
    Dict(
      Map(
        "a" -> Str("A"),
        "f" -> Func(List("a"), Plus(Ident("a"), Ident("a"))),
        "nested" -> Dict(Map("k" -> Ident("variable")))
      )
    )
  ),
  85
)

Evaluating the Syntax Tree

Now that we have a syntax tree of Expr nodes, the next step is to interpret the syntax tree to provide a runtime data structure of Values. We will define Value as follows:

sealed trait Value
object Value{
  case class Str(s: String) extends Value
  case class Dict(pairs: Map[String, Value]) extends Value
  case class Func(call: Seq[Value] => Value) extends Value
}

(To enter this code into the Ammonite REPL, wrap it in a pair of curlies {...} so it is entered as a single unit)

Note that while the Expr syntax tree contains nodes that represent identifiers, local variables, function application, and so on, a Value can only be a Str, a Dict, or a Func. It doesn't matter where a Value.Str came from: whether a literal Expr.Str in the source code, passed in as a function parameter to a Expr.Func, or bound to a local variable via Expr.Local, it is still the same Value.Str. The final Value for the entire Jsonnet program is then converted to a JSON string as the output of the program.

The contents of Value.Str and Value.Dict should be self explanatory. Value.Func is a bit less obvious: by defining it as Func(call: Seq[Value] => Value), we are saying "a function is something you can pass a list of argument values to and returns a value". We will see how to instantiate these Value.Func nodes later.

The basic task here is to write a function that converts a Expr into a Value:

def evaluate(expr: Expr): Value

However, the Value returned by evaluating an Expr isn't only dependent on the contents of that Expr: it also depends on the enclosing scope, as the value of Expr.Ident identifiers depends on what value was bound to that name, via local declarations or function parameters.

This mapping of names to Values is often known as the "lexical scope" within your program. Thus we might instead define evaluate as:

def evaluate(expr: Expr, scope: Map[String, Value]): Value

From here we can start fleshing it out.

Evaluating Literals and Plus

The simplest case is literal Expr.Str, which are evaluated to the identical Value.Str:

def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match{
  case Expr.Str(s) => Value.Str(s)
}

We can then test this on a simple literal strings:

@ evaluate(fastparse.parse("\"hello\"", Parser.expr(_)).get.value, Map.empty)
res79: Value = Str("hello")

And see that it gives us what we want.

Literal dictionaries are also straightforward: Expr.Dicts become Value.Dicts, with the same keys, except we need to evaluate each value into its corresponding expression:

case Expr.Dict(kvs) => Value.Dict(kvs.map{case (k, v) => (k, evaluate(v, scope))})

Dictionary literals do not add or remove anything from the lexical scope, so the scope parameter is passed through unchanged. Testing it gives us:

@ evaluate(
    fastparse.parse("""{"hello": "world", "key": "value"}""", Parser.expr(_)).get.value,
    Map.empty
  )
res81: Value = Dict(Map("hello" -> Str("world"), "key" -> Str("value")))

Next, we will look at the Expr.Plus nodes. We have only defined their behavior to work on string values (Value.Str), so evaluating them is involves:

case Expr.Plus(items) =>
  Value.Str(items.map(evaluate(_, scope)).map{case Value.Str(s) => s}.mkString)
@ evaluate(fastparse.parse("\"hello\" + \"world\"", Parser.expr(_)).get.value, Map.empty)
res5: Value = Str("helloworld")

The evaluate function so far looks like this:

def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match{
  case Expr.Str(s) => Value.Str(s)
  case Expr.Dict(kvs) => Value.Dict(kvs.map{case (k, v) => (k, evaluate(v, scope))})
  case Expr.Plus(items) =>
    Value.Str(items.map(evaluate(_, scope)).map{case Value.Str(s) => s}.mkString)
}

Evaluating Locals and Identifiers

The next thing we will look at is evaluating the local variable syntax:

@ fastparse.parse(
    """local greeting = "Hello ";greeting + greeting""",
    Parser.expr(_)
  )
res85: Parsed[Expr] = Success(
  Local("greeting", Str("Hello "), Plus(Ident("greeting"), Ident("greeting"))),
  45
)

These are represented as Expr.Local nodes in our syntax tree:

case class Local(name: String, assigned: Expr, body: Expr) extends Expr

The point of local is to evaluate the assigned expression to a value, assign that value to the name, and then evaluate the body expression with that value bound to that name. We can write that in code as follows:

case Expr.Local(name, assigned, body) =>
  val assignedValue = evaluate(assigned, scope) 
  evaluate(body, scope + (name -> assignedValue)) 

Once a local has put a name into scope, evaluating Ident identifier nodes is straightforward: simply fetch the value of that name in the scope:

case Expr.Ident(name) => scope(name) 

We can test it to see that it works in the success case:

@ evaluate(
    fastparse.parse("""local greeting = "Hello "; greeting + greeting""", expr(_)).get.value,
    Map.empty
  )
res94: Value = Str("Hello Hello ")

Even when multiple locals are chained together:

@ evaluate(
    fastparse.parse("""local x = "Hello "; local y = "world"; x + y""", expr(_)).get.value,
    Map.empty
  )
res96: Value = Str("Hello world")

And also that it fails when an identifier is not found:

@ evaluate(
    fastparse.parse("""local greeting = "Hello "; nope + nope""", expr(_)).get.value,
    Map.empty
  )
java.util.NoSuchElementException: key not found: nope
  scala.collection.immutable.Map$Map1.apply(Map.scala:242)
  ammonite.$sess.cmd93$.evaluate(cmd93.sc:10)
  ammonite.$sess.cmd93$.evaluate(cmd93.sc:5)
  ammonite.$sess.cmd95$.<clinit>(cmd95.sc:3)

Generating a nicer-looking error message is left as an exercise for the reader.

The evaluate function now looks like:

def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match{
  case Expr.Str(s) => Value.Str(s)
  case Expr.Dict(kvs) => Value.Dict(kvs.map{case (k, v) => (k, evaluate(v, scope))})
  case Expr.Plus(items) =>
    Value.Str(items.map(evaluate(_, scope)).map{case Value.Str(s) => s}.mkString)
  case Expr.Local(name, assigned, body) =>
    val assignedValue = evaluate(assigned, scope) 
    evaluate(body, scope + (name -> assignedValue)) 
  case Expr.Ident(name) => scope(name) 
}

Evaluating Functions

The last thing we have left to evaluate are the Expr.Func function literals and Expr.Call function application nodes:

case class Func(params: Seq[String], body: Expr) extends Expr
case class Call(expr: Expr, args: Seq[Expr]) extends Expr

Evaluating an Expr.Func should give us a Value.Func:

case class Func(call: Seq[Value] => Value) extends Value

And evaluating an Expr.Call on a Value.Func should give us the result of evaluating that function. The result could be a Value.Str, a Value.Dict, or even another Value.Func.

Evaluating an Expr.Call node is straightforward: we simply evaluate the expr: Expr into a Value.Func, evaluate the args: Seq[Expr] into a sequence of argument values, and then call the Value.Func#call function on the evaluated argument values to give us the result

case Expr.Call(expr, args) =>
  val Value.Func(call) = evaluate(expr, scope)
  val evaluatedArgs = args.map(evaluate(_, scope))
  call(evaluatedArgs)

The trickiness here is: how do we evaluate the Expr.Func to produce a Value.Func whose call attribute does what we want?

When you think about what calling a function really means, it boils down to four steps:

In code, this looks like the following:

case Expr.Func(argNames, body) =>
  Value.Func(args => evaluate(body, scope ++ argNames.zip(args)))

We can then test this with some of the functions we saw earlier:

@ evaluate(
    fastparse.parse(
      """local f = function(a, b) a + " " + b; f("hello", "world")""",
      expr(_)
    ).get.value,
    Map.empty
  )
res119: Value = Str("hello world")

@ evaluate(
    fastparse.parse(
      """local hello = function(name) "Hello " + name; hello("Bob")""",
      expr(_)
    ).get.value,
    Map.empty
  )
res120: Value = Str("Hello Bob")
@ evaluate(
    fastparse.parse(
      """local greeting = "Hello ";
         local person = function (name) {
           "name": name,
           "welcome": greeting + name + "!"
         };
         {
           "person1": person("Alice"),
           "person2": person("Bob"),
           "person3": person("Charlie")
         }
      """,
      expr(_)
    ).get.value,
    Map.empty
  )
res132: Value = Dict(
  Map(
    "person1" -> Dict(Map("name" -> Str("Alice"), "welcome" -> Str("Hello Alice!"))),
    "person2" -> Dict(Map("name" -> Str("Bob"), "welcome" -> Str("Hello Bob!"))),
    "person3" -> Dict(Map("name" -> Str("Charlie"), "welcome" -> Str("Hello Charlie!")))
  )
)

The evaluate function now looks like:

def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match{
  case Expr.Str(s) => Value.Str(s)
  case Expr.Dict(kvs) => Value.Dict(kvs.map{case (k, v) => (k, evaluate(v, scope))})
  case Expr.Plus(items) =>
    Value.Str(items.map(evaluate(_, scope)).map{case Value.Str(s) => s}.mkString)
  case Expr.Local(name, assigned, body) =>
    val assignedValue = evaluate(assigned, scope)
    evaluate(body, scope + (name -> assignedValue))
  case Expr.Ident(name) => scope(name)
  case Expr.Call(expr, args) =>
    val Value.Func(call) = evaluate(expr, scope)
    val evaluatedArgs = args.map(evaluate(_, scope))
    call(evaluatedArgs)
  case Expr.Func(argNames, body) =>
    Value.Func(args => evaluate(body, scope ++ argNames.zip(args)))
}

Outputting JSON

In the last two sections, we saw how to parse an input String of Jsonnet source code into an Expr syntax tree structure, and then evaluate the Expr into a Value data structure that represents a JSON value. In this last section of the tutorial, we will see how to serialize the Value data structure to an output JSON String.

The signature of our serialize function looks like this:

def serialize(v: Value): String = ???

Given a Value can only be a Value.Str, Value.Dict, or Value.Func, we can expand this into:

def serialize(v: Value): String = v match{
  case Value.Str(s) => ???
  case Value.Dict(kvs) => ??? 
}

For now, we will assume there are no longer any Value.Func nodes in the final data structure, as functions cannot be serialized to JSON. Production implementations of Jsonnet like google/jsonnet or Sjsonnet simply raise an error if an un-called function is present in the output.

Since earlier we assumed there are no escape sequences in our strings, we can serialize Value.Str nodes by simply adding double quotes "s around them:

def serialize(v: Value): String = v match{
  case Value.Str(s) => "\"" + s + "\""
  case Value.Dict(kvs) => ??? 
}

As for Value.Dicts, we should serialize each key-value pair as "foo": "bar", separated by commas, and wrap the whole thing in curly brackets:

def serialize(v: Value): String = v match{ def serialize(v: Value): String = v match{
  case Value.Str(s) => "\"" + s + "\""
  case Value.Dict(kvs) => 
    kvs.map{case (k, v) => "\"" + k + "\": " + serialize(v)}.mkString("{", ", ", "}")
}

And we can test this with the Jsonnet programs we saw earlier:

@ println(
    serialize(
      evaluate(
        fastparse.parse(
          """{"hello": "world", "key": "value"}""",
          expr(_)
        ).get.value,
        Map.empty
      )
    )
  )
{"hello": "world", "key": "value"}

@ println(
    serialize(
      evaluate(
        fastparse.parse(
          """local x = "Hello "; local y = "world"; x + y""",
          expr(_)
        ).get.value,
        Map.empty
      )
    )
  )
"Hello world"

@ println(
    serialize(
      evaluate(
        fastparse.parse(
          """local greeting = "Hello ";
           local person = function (name) {
             "name": name,
             "welcome": greeting + name + "!"
           };
           {
             "person1": person("Alice"),
             "person2": person("Bob"),
             "person3": person("Charlie")
           }""",
          expr(_)
        ).get.value,
        Map.empty
      )
    )
  )
{
  "person1": {"name": "Alice", "welcome": "Hello Alice!"}, 
  "person2": {"name": "Bob", "welcome": "Hello Bob!"}, 
  "person3": {"name": "Charlie", "welcome": "Hello Charlie!"}
}

Now all we need to do is wrap the entire parse -> evaluate -> serialize pipeline in a nice helper function:

@ def jsonnet(input: String) = {
    serialize(evaluate(fastparse.parse(input, expr(_)).get.value, Map.empty))
  }

And it's ready for use!

@ println(jsonnet(
  """local greeting = "Hello ";
     local person = function (name) {
       "name": name,
       "welcome": greeting + name + "!"
     };
     {
       "person1": person("Alice"),
       "person2": person("Bob"),
       "person3": person("Charlie")
     }"""
  ))
{
  "person1": {"name": "Alice", "welcome": "Hello Alice!"}, 
  "person2": {"name": "Bob", "welcome": "Hello Bob!"}, 
  "person3": {"name": "Charlie", "welcome": "Hello Charlie!"}
}

Conclusion

In this tutorial, we have walked through how to use Scala to implement an interpreter for a simple programming language: a subset of Jsonnet. We used Fastparse to parse the source code string into an Expr syntax tree, recursively evaluated the Expr syntax tree into a Value data structure, and serialized the Value data structure to a JSON String. We have done so in just 65 lines of code, shown below:

sealed trait Expr
object Expr{
  case class Str(s: String) extends Expr
  case class Ident(name: String) extends Expr
  case class Plus(nodes: Seq[Expr]) extends Expr
  case class Dict(pairs: Map[String, Expr]) extends Expr
  case class Local(name: String, assigned: Expr, body: Expr) extends Expr
  case class Func(argNames: Seq[String], body: Expr) extends Expr
  case class Call(expr: Expr, args: Seq[Expr]) extends Expr
}
object Parser {
  def expr[_: P]: P[Expr] = P( prefixExpr ~ plus.rep ).map{
    case (e, Nil) => e
    case (e, items) => Expr.Plus(e +: items)
  }
  def prefixExpr[_: P]: P[Expr] = P( callExpr ~ call.rep ).map{ case (e, items) =>
    items.foldLeft(e)((f, args) => Expr.Call(f, args))
  }
  def callExpr[_: P] = P( str | dict | local | func | ident )

  def str[_: P] = P( str0 ).map(Expr.Str)
  def str0[_: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" )
  def ident[_: P] = P( ident0 ).map(Expr.Ident)
  def ident0[_: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!

  def dict[_: P] = P( "{" ~/ (str0 ~ ":" ~/ expr).rep(0, ",") ~ "}" ).map(kvs => Expr.Dict(kvs.toMap))
  def local[_: P] = P( "local" ~/ ident0 ~ "=" ~ expr ~ ";" ~ expr ).map(Expr.Local.tupled)
  def func[_: P] = P( "function" ~/ "(" ~ ident0.rep(0, ",") ~ ")" ~ expr ).map(Expr.Func.tupled)

  def plus[_: P] = P( "+" ~ prefixExpr )
  def call[_: P] = P( "(" ~/ expr.rep(0, ",") ~ ")" )
}
sealed trait Value
object Value{
  case class Str(s: String) extends Value
  case class Dict(pairs: Map[String, Value]) extends Value
  case class Func(call: Seq[Value] => Value) extends Value
}

def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match{
  case Expr.Str(s) => Value.Str(s)
  case Expr.Dict(kvs) => Value.Dict(kvs.map{case (k, v) => (k, evaluate(v, scope))})
  case Expr.Plus(items) =>
    Value.Str(items.map(evaluate(_, scope)).map{case Value.Str(s) => s}.mkString)
  case Expr.Local(name, assigned, body) =>
    val assignedValue = evaluate(assigned, scope)
    evaluate(body, scope + (name -> assignedValue))
  case Expr.Ident(name) => scope(name)
  case Expr.Call(expr, args) =>
    val Value.Func(call) = evaluate(expr, scope)
    val evaluatedArgs = args.map(evaluate(_, scope))
    call(evaluatedArgs)
  case Expr.Func(argNames, body) =>
    Value.Func(args => evaluate(body, scope ++ argNames.zip(args)))
}

def serialize(v: Value): String = v match{
  case Value.Str(s) => "\"" + s + "\""
  case Value.Dict(kvs) =>
    kvs.map{case (k, v) => "\"" + k + "\": " + serialize(v)}.mkString("{", ", ", "}")
}

def jsonnet(input: String): String = {
  serialize(evaluate(fastparse.parse(input, Parser.expr(_)).get.value, Map.empty))
}

While we did all the work so far in a REPL, you can also paste this code into a Scala Script, or include it in a larger Scala project built using Mill or SBT.

In the interest of time, this tutorial is greatly simplified. Nevertheless, the same techniques apply to implementing most interpreted programming languages, including production-ready implementations like Sjsonnet. Hopefully this post has given you an intuition for how a simple programming language is implemented. Even if you never implement your own programming language in future, these techniques may still come in useful if you find yourself writing linters, program analyzers, query engines, and other such tools.

If you are interested in learning more, here are some simple exercises that extend the interpreter we built above, which you can try out to get a better feel of how this interpreter works:

(function (x) x + x)("hello") // "hellohello"
let my_array = [2, 4, 6, 8];
let index = 4 / 2;
my_array[index + 1] // 6
local f = function(x) x + x;
f + f
// Error: binary operator + does not operate on functions.
//	example1.jsonnet:2

"hello"("world")
// Error: only functions can be called, got string
//	example1.jsonnet:1:1-17	
local greeting = "Hello ";
local person = function(name){
  "name": name,
  "welcome": greeting + name + "!",
};
{
  "person1": person("Alice"),
  "person2": person("Bob"),
  "person3": person("Charlie"),
}

If you're interested in diving deeper into the world of programming language implementation, a good next step would be the following e-book:


About the Author: Haoyi is a software engineer, an early contributor to Scala.js, and the author of many open-source Scala tools such as the Ammonite REPL and FastParse.

If you've enjoyed this blog, or enjoyed using Haoyi's other open source libraries, please chip in (or get your Company to chip in!) via Patreon so he can continue his open-source work


Easy Parallel Programming with Scala FuturesSimple Web and Api Servers with Scala

Updated 2019-09-11