One strength of Scala is implementing programming languages. Even if your goal is not to implement an entirely new programming language, these techniques are still useful: for writing linters, program analyzers, query engines, and other such tools. This blog post will walk you through the process of implementing a simple programming language in Scala, introduce some of the fundamental concepts involved, and finishing with a working interpreter for a simple programming language.
About the Author: Haoyi is a software engineer, and the author of many open-source Scala tools such as the Ammonite REPL and the Mill Build Tool. If you enjoyed the contents on this blog, you may also enjoy Haoyi's book Hands-on Scala Programming
The goal of this exercise will to implement a subset of the Jsonnet programming language:
input.jsonnet
local greeting = "Hello ";
local person = function(name){
"name": name,
"welcome": greeting + name + "!"
};
{
"person1": person("Alice"),
"person2": person("Bob"),
"person3": person("Charlie")
}
output.json
{
"person1": {
"name": "Alice",
"welcome": "Hello Alice!"
},
"person2": {
"name": "Bob",
"welcome": "Hello Bob!"
},
"person3": {
"name": "Charlie",
"welcome": "Hello Charlie!"
}
}
Jsonnet is a simple language meant to construct JSON configuration files: the output of evaluating a .jsonnet
file is a single JSON structure containing dictionaries, lists, strings, numbers, and booleans. The output JSON can then be used to configure Kubernetes, CloudFormation, Terraform, or other software systems. Jsonnet is used heavily at Google, Databricks, and other companies in order to manage their large and complex system configurations.
Our interpreter will roughly follow the following steps:
String
into a syntax tree structure called Expr
Expr
into a Jsonnet value called Value
Value
into an output JSON String
This tutorial will walk you through all three phases of implementing our simple interpreter, giving you an intuition for the techniques, data structures and algorithms involved. While not a comprehensive or production-ready implementation, this tutorial should hopefully give you enough foundation to get started with your own simple language-related projects.
For this exercise, we will do all our work inside the Ammonite Scala REPL. Ammonite can be installed via:
$ sudo sh -c '(echo "#!/usr/bin/env sh" && curl -L https://github.com/lihaoyi/Ammonite/releases/download/1.6.8/2.13-1.6.8) > /usr/local/bin/amm && chmod +x /usr/local/bin/amm' && amm
Loading...
Welcome to the Ammonite Repl 1.6.8
(Scala 2.13.0 Java 11.0.2)
If you like Ammonite, please support our development at www.patreon.com/lihaoyi
@
Once the prompt is open, we are ready to begin.
local greeting = "Hello ";
local person = function(name){
"name": name,
"welcome": greeting + name + "!"
};
{
"person1": person("Alice"),
"person2": person("Bob"),
"person3": person("Charlie")
}
Jsonnet itself is similar to JSON, but introduces the constructs that help you tidy up verbose or repetitive portions of your JSON config: local variables like greeting
above, function definitions like person
, and basic operations like concatenating strings with +
. For the purpose of this exercise, we will stop at implementing these three simple features. Further features and improvements can be implemented the same way, and we will leave the task of exhaustively implementing the entire Jsonnet language to production intepreters like Sjsonnet.
Jsonnet has a similar set of primitives as JSON. For this tutorial, we will consider just a subset of them:
"hello world" // strings
{"key": "value", "thing": "123"} // dictionaries
For now, we will assume strings do not contain escape sequences like \n
or \"
, and leave out additional data types like numbers, booleans, arrays, or null.
We will also assume strings only support a single operation: concatenation via +
.
"hello" + "world"
"helloworld"
You can define local variables using the local
keyword, the assignment of an expression to a name, a semicolon ;
, and then the final expression:
local greeting = "Hello ";
greeting + greeting
"Hello Hello "
You can define local functions by combining the local
keyword with a function
expression:
local hello = function(name) "Hello " + name;
hello("Bob")
"Hello Bob"
As in most programming languages, functions are called using parentheses containing the arguments you wish to pass.
Local variables and functions can also return structured data in dictionaries:
local person = function(name) {
"name": name,
"welcome": "Hello " + name + '!',
};
person("Bob")
{
"name": "Bob",
"welcome": "Hello Bob!",
};
The language features described here can be combined in arbitrary ways, e.g. a function can be called inside a dictionary value:
local f = function(x) "Hello " + x;
{"key": "value", "thing": f("World")}
{
"key": "value",
"thing": "Hello World"
}
And that function may itself return a dictionary, that gets nested within:
local f = function(x) {"nested key": "Hello " + x};
{"key": "value", "thing": f("World")}
{
"key": "value",
"thing": {
"nested key": "Hello World"
}
}
This quick tour is only a small subset of the full Jsonnet language, but will be enough for this tutorial.
To parse Jsonnet, we will be using the Fastparse library introduced in this blog post:
Let's start by defining a syntax tree of what our minimal Jsonnet syntax looks like: strings, dictionaries, functions, and local definitions. We will do so using a Scala sealed trait
and case class
es:
sealed trait Expr
object Expr{
case class Str(s: String) extends Expr
case class Ident(name: String) extends Expr
case class Plus(nodes: Seq[Expr]) extends Expr
case class Dict(pairs: Map[String, Expr]) extends Expr
case class Local(name: String, assigned: Expr, body: Expr) extends Expr
case class Func(argNames: Seq[String], body: Expr) extends Expr
case class Call(expr: Expr, args: Seq[Expr]) extends Expr
}
(To enter this code into the Ammonite REPL, wrap it in a pair of curlies {...}
so it is entered as a single unit)
Here, the Expr
data structure is meant to represent the meaningful parts of the Jsonnet syntax. Here are some example snippets and what we expect them to parse to.
Simple strings:
"hello"
Str("hello")
Dictionaries:
{"hello": "world", "123": "456"}
Dict(Map("hello" -> Str("world"), "123" -> Str("456")))
Functions:
function(a, b) a + " " + b
Func(Seq("a", "b"), Plus(Seq(Ident("a"), Str(" "), Ident("b"))))
Local variable declarations:
local f = function(a) a + "1";
f("123")
Local("f", Func(Seq("a"), Plus(Seq(Ident("a"), Str("1"))), Call(Ident("f"), Str(123))
And of course, we expect to be able to parse any combination of these language features combined together.
To begin using Fastparse, we will use the following imports:
@ import fastparse._, MultiLineWhitespace._
import fastparse._, MultiLineWhitespace._
First, let's write the parser for Str
. We will ignore escape characters for simplicity, meaning a string is simply a "
, followed by zero-or-more non-"
characters, closed by another "
:
@ def str[_: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" ).map(Expr.Str)
defined function str
@ fastparse.parse("\"hello\"", str(_))
res10: Parsed[Str] = Success(Str("hello"), 7)
@ fastparse.parse("\"hello world\"", str(_))
res11: Parsed[Str] = Success(Str("hello world"), 13)
@ fastparse.parse("\"\"", str(_))
res12: Parsed[Str] = Success(Str(""), 2)
@ println(fastparse.parse("123", str(_)))
Parsed.Failure(Position 1:1, found "123")
Note how we use the ~~/
operator after the "\""
open quote: the ~~
means we do not want to consume whitespace here (since we are inside a string) and the /
is a Fastparse Cut meaning we want to avoid backtracking if the parse fails. A detailed discussion of what Cuts give us is left to the linked documentation.
Identifiers are similar: an alphabet or underscore, followed by zero or more alphanumeric characters.
@ def ident[_: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!.map(Expr.Ident)
defined function ident
@ fastparse.parse("hello", ident(_))
res17: Parsed[Ident] = Success(Ident("hello"), 5)
@ println(fastparse.parse("123", ident(_)))
Parsed.Failure(Position 1:1, found "123")
The next syntax tree node we will look into parsing is Expr.Plus
, used to model the a + b
syntax.
The Expr.Plus
nodes representing +
, and all other case class
es in our syntax tree, are a bit more complex: they have a recursive definition where Plus
is an Expr
, but an Expr
could be a Plus
node. This can be solved by making our parsers recursive, as follows:
@ {
def expr[_: P]: P[Expr] = P( prefixExpr ~ plus.rep ).map{
case (e, Nil) => e
case (e, items) => Expr.Plus(e +: items)
}
def prefixExpr[_: P] = P( str | ident )
def str[_: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" ).map(Expr.Str)
def ident[_: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!.map(Expr.Ident)
def plus[_: P] = P( "+" ~ prefixExpr )
}
@ fastparse.parse("a + b", expr(_))
res65: Parsed[Expr] = Success(Plus(List(Ident("a"), Ident("b"))), 5)
@ fastparse.parse("a + b + c", expr(_))
res66: Parsed[Expr] = Success(Plus(List(Ident("a"), Ident("b"), Ident("c"))), 9)
@ fastparse.parse("a + \" \" + c", expr(_))
res67: Parsed[Expr] = Success(Plus(List(Ident("a"), Str(" "), Ident("c"))), 11)
Note that we cannot simply define plus
as expr ~ "+" ~ expr
; this is because the plus
parser would then be Left Recursive, causing an infinite recursion at parse time. Instead, we need to define plus
as just the suffix "+" ~ prefixExpr
, and have the expr
parser do the work of repeating plus
via ~ plus.rep
and aggregating the results into a Expr.Plus
node if non-empty.
Expr.Dict
nodes are also recursive, each comma-separated key-value pair containing a Expr.Str
key and a Expr
value. We can parse those as follows:
@ {
def expr[_: P]: P[Expr] = P( prefixExpr ~ plus.rep).map{
case (e, Nil) => e
case (e, items) => Expr.Plus(e +: items)
}
def prefixExpr[_: P] = P( str | ident | dict )
def str[_: P] = P( str0 ).map(Expr.Str)
def str0[_: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" )
def ident[_: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!.map(Expr.Ident)
def plus[_: P] = P( "+" ~ prefixExpr )
def dict[_: P] = P( "{" ~/ (str0 ~ ":" ~/ expr).rep(0, ",") ~ "}" ).map(kvs => Expr.Dict(kvs.toMap))
}
@ fastparse.parse("""{"a": "b", "cde": id}""", expr(_))
res84: Parsed[Expr] = Success(Dict(Map("a" -> Str("b"), "cde" -> Ident("id"))), 21)
@ fastparse.parse("""{"a": "b", "cde": id}""", expr(_))
res85: Parsed[Expr] = Success(Dict(Map("a" -> Str("b"), "cde" -> Ident("id"))), 21)
@ fastparse.parse("""{"a": "b", "cde": id, "nested": {}}""", expr(_))
res86: Parsed[Expr] = Success(
Dict(Map("a" -> Str("b"), "cde" -> Ident("id"), "nested" -> Dict(Map()))),
35
)
Note how we extracted the str0
parser from str
: str0
returns the raw String
that was parsed, while str
wraps it in an Expr.Str
syntax tree node. Since Expr.Dict
keys are syntactically the same as Expr.Str
s, but do not need to be wrapped in Expr.Str
nodes, we can re-use the str0
parser in our dict
parser to parse them as well.
Adding the parsers for func
, local
and call
to this code, we get the following:
object Parser {
def expr[_: P]: P[Expr] = P( prefixExpr ~ plus.rep ).map{
case (e, Nil) => e
case (e, items) => Expr.Plus(e +: items)
}
def prefixExpr[_: P]: P[Expr] = P( callExpr ~ call.rep ).map{ case (e, items) =>
items.foldLeft(e)((f, args) => Expr.Call(f, args))
}
def callExpr[_: P] = P( str | dict | local | func | ident )
def str[_: P] = P( str0 ).map(Expr.Str)
def str0[_: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" )
def ident[_: P] = P( ident0 ).map(Expr.Ident)
def ident0[_: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!
def dict[_: P] = P( "{" ~/ (str0 ~ ":" ~/ expr).rep(0, ",") ~ "}" ).map(kvs => Expr.Dict(kvs.toMap))
def local[_: P] = P( "local" ~/ ident0 ~ "=" ~ expr ~ ";" ~ expr ).map(Expr.Local.tupled)
def func[_: P] = P( "function" ~/ "(" ~ ident0.rep(0, ",") ~ ")" ~ expr ).map(Expr.Func.tupled)
def plus[_: P] = P( "+" ~ prefixExpr )
def call[_: P] = P( "(" ~/ expr.rep(0, ",") ~ ")" )
}
func
and local
are relatively straightforward: each one starts with a keyword, and they can be parsed recursively without issue. We have also split out ident0
from ident
, since the func
parser uses the same syntax as ident
for parsing its parameter list but does not need the identifiers to be boxed into Expr.Ident
syntax tree nodes.
Note that call
also needs us to split up prefixExpr
further into callExpr
, since the a()
call syntax would otherwise be left-recursive similar to the a + b
plus syntax we saw earlier.
We can test the parser manually and see that it does what we want:
@ fastparse.parse("\"123\"", Parser.expr(_))
res63: Parsed[Expr] = Success(Str("123"), 5)
@ fastparse.parse("id", Parser.expr(_))
res64: Parsed[Expr] = Success(Ident("id"), 2)
@ fastparse.parse("a + b", Parser.expr(_))
res65: Parsed[Expr] = Success(Plus(Ident("a"), Ident("b")), 5)
@ fastparse.parse("a + b + c", Parser.expr(_))
res66: Parsed[Expr] = Success(Plus(Ident("a"), Plus(Ident("b"), Ident("c"))), 9)
@ fastparse.parse("""{"a": "A", "b": "bee"}""", Parser.expr(_))
res69: Parsed[Expr] = Success(Dict(Map("a" -> Str("A"), "b" -> Str("bee"))), 22)
@ fastparse.parse("""f()(a) + g(b, c)""", Parser.expr(_))
res95: Parsed[Expr] = Success(
Plus(
List(
Call(Call(Ident("f"), List()), List(Ident("a"))),
Call(Ident("g"), List(Ident("b"), Ident("c")))
)
),
16
)
The syntax of our programming language is recursive: local
, function
, plus
, and dict
expressions can contain other expressions, nested arbitrarily deeply. We can test this out by feeding such nested examples into the expr
parser:
@ fastparse.parse(
"""local variable = "kay"; {"a": "A", "f": function(a) a + a, "nested": {"k": variable}}""",
Parser.expr(_)
)
res74: Parsed[Expr] = Success(
Local(
"variable",
Str("kay"),
Dict(
Map(
"a" -> Str("A"),
"f" -> Func(List("a"), Plus(Ident("a"), Ident("a"))),
"nested" -> Dict(Map("k" -> Ident("variable")))
)
)
),
85
)
Now that we have a syntax tree of Expr
nodes, the next step is to interpret the syntax tree to provide a runtime data structure of Value
s. We will define Value
as follows:
sealed trait Value
object Value{
case class Str(s: String) extends Value
case class Dict(pairs: Map[String, Value]) extends Value
case class Func(call: Seq[Value] => Value) extends Value
}
(To enter this code into the Ammonite REPL, wrap it in a pair of curlies {...}
so it is entered as a single unit)
Note that while the Expr
syntax tree contains nodes that represent identifiers, local variables, function application, and so on, a Value
can only be a Str
, a Dict
, or a Func
. It doesn't matter where a Value.Str
came from: whether a literal Expr.Str
in the source code, passed in as a function parameter to a Expr.Func
, or bound to a local variable via Expr.Local
, it is still the same Value.Str
. The final Value
for the entire Jsonnet program is then converted to a JSON string as the output of the program.
The contents of Value.Str
and Value.Dict
should be self explanatory. Value.Func
is a bit less obvious: by defining it as Func(call: Seq[Value] =>
Value)
, we are saying "a function is something you can pass a list of argument values to and returns a value". We will see how to instantiate these Value.Func
nodes later.
The basic task here is to write a function that converts a Expr
into a Value
:
def evaluate(expr: Expr): Value
However, the Value
returned by evaluating an Expr
isn't only dependent on the contents of that Expr
: it also depends on the enclosing scope, as the value of Expr.Ident
identifiers depends on what value was bound to that name, via local
declarations or function
parameters.
This mapping of names to Value
s is often known as the "lexical scope" within your program. Thus we might instead define evaluate
as:
def evaluate(expr: Expr, scope: Map[String, Value]): Value
From here we can start fleshing it out.
The simplest case is literal Expr.Str
, which are evaluated to the identical Value.Str
:
def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match{
case Expr.Str(s) => Value.Str(s)
}
We can then test this on a simple literal strings:
@ evaluate(fastparse.parse("\"hello\"", Parser.expr(_)).get.value, Map.empty)
res79: Value = Str("hello")
And see that it gives us what we want.
Literal dictionaries are also straightforward: Expr.Dict
s become Value.Dict
s, with the same keys, except we need to evaluate each value into its corresponding expression:
case Expr.Dict(kvs) => Value.Dict(kvs.map{case (k, v) => (k, evaluate(v, scope))})
Dictionary literals do not add or remove anything from the lexical scope, so the scope
parameter is passed through unchanged. Testing it gives us:
@ evaluate(
fastparse.parse("""{"hello": "world", "key": "value"}""", Parser.expr(_)).get.value,
Map.empty
)
res81: Value = Dict(Map("hello" -> Str("world"), "key" -> Str("value")))
Next, we will look at the Expr.Plus
nodes. We have only defined their behavior to work on string values (Value.Str
), so evaluating them is involves:
Value.Str
Value.Str
syntax tree nodecase Expr.Plus(items) =>
Value.Str(items.map(evaluate(_, scope)).map{case Value.Str(s) => s}.mkString)
@ evaluate(fastparse.parse("\"hello\" + \"world\"", Parser.expr(_)).get.value, Map.empty)
res5: Value = Str("helloworld")
The evaluate
function so far looks like this:
def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match{
case Expr.Str(s) => Value.Str(s)
case Expr.Dict(kvs) => Value.Dict(kvs.map{case (k, v) => (k, evaluate(v, scope))})
case Expr.Plus(items) =>
Value.Str(items.map(evaluate(_, scope)).map{case Value.Str(s) => s}.mkString)
}
The next thing we will look at is evaluating the local
variable syntax:
@ fastparse.parse(
"""local greeting = "Hello ";greeting + greeting""",
Parser.expr(_)
)
res85: Parsed[Expr] = Success(
Local("greeting", Str("Hello "), Plus(Ident("greeting"), Ident("greeting"))),
45
)
These are represented as Expr.Local
nodes in our syntax tree:
case class Local(name: String, assigned: Expr, body: Expr) extends Expr
The point of local
is to evaluate the assigned
expression to a value, assign that value to the name
, and then evaluate the body
expression with that value bound to that name. We can write that in code as follows:
case Expr.Local(name, assigned, body) =>
val assignedValue = evaluate(assigned, scope)
evaluate(body, scope + (name -> assignedValue))
Once a local
has put a name into scope
, evaluating Ident
identifier nodes is straightforward: simply fetch the value of that name in the scope
:
case Expr.Ident(name) => scope(name)
We can test it to see that it works in the success case:
@ evaluate(
fastparse.parse("""local greeting = "Hello "; greeting + greeting""", expr(_)).get.value,
Map.empty
)
res94: Value = Str("Hello Hello ")
Even when multiple local
s are chained together:
@ evaluate(
fastparse.parse("""local x = "Hello "; local y = "world"; x + y""", expr(_)).get.value,
Map.empty
)
res96: Value = Str("Hello world")
And also that it fails when an identifier is not found:
@ evaluate(
fastparse.parse("""local greeting = "Hello "; nope + nope""", expr(_)).get.value,
Map.empty
)
java.util.NoSuchElementException: key not found: nope
scala.collection.immutable.Map$Map1.apply(Map.scala:242)
ammonite.$sess.cmd93$.evaluate(cmd93.sc:10)
ammonite.$sess.cmd93$.evaluate(cmd93.sc:5)
ammonite.$sess.cmd95$.<clinit>(cmd95.sc:3)
Generating a nicer-looking error message is left as an exercise for the reader.
The evaluate
function now looks like:
def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match{
case Expr.Str(s) => Value.Str(s)
case Expr.Dict(kvs) => Value.Dict(kvs.map{case (k, v) => (k, evaluate(v, scope))})
case Expr.Plus(items) =>
Value.Str(items.map(evaluate(_, scope)).map{case Value.Str(s) => s}.mkString)
case Expr.Local(name, assigned, body) =>
val assignedValue = evaluate(assigned, scope)
evaluate(body, scope + (name -> assignedValue))
case Expr.Ident(name) => scope(name)
}
The last thing we have left to evaluate are the Expr.Func
function literals and Expr.Call
function application nodes:
case class Func(params: Seq[String], body: Expr) extends Expr
case class Call(expr: Expr, args: Seq[Expr]) extends Expr
Evaluating an Expr.Func
should give us a Value.Func
:
case class Func(call: Seq[Value] => Value) extends Value
And evaluating an Expr.Call
on a Value.Func
should give us the result of evaluating that function. The result could be a Value.Str
, a Value.Dict
, or even another Value.Func
.
Evaluating an Expr.Call
node is straightforward: we simply evaluate the expr:
Expr
into a Value.Func
, evaluate the args: Seq[Expr]
into a sequence of argument values, and then call the Value.Func#call
function on the evaluated argument values to give us the result
case Expr.Call(expr, args) =>
val Value.Func(call) = evaluate(expr, scope)
val evaluatedArgs = args.map(evaluate(_, scope))
call(evaluatedArgs)
The trickiness here is: how do we evaluate the Expr.Func
to produce a Value.Func
whose call
attribute does what we want?
When you think about what call
ing a function really means, it boils down to four steps:
In code, this looks like the following:
case Expr.Func(argNames, body) =>
Value.Func(args => evaluate(body, scope ++ argNames.zip(args)))
We can then test this with some of the functions we saw earlier:
@ evaluate(
fastparse.parse(
"""local f = function(a, b) a + " " + b; f("hello", "world")""",
expr(_)
).get.value,
Map.empty
)
res119: Value = Str("hello world")
@ evaluate(
fastparse.parse(
"""local hello = function(name) "Hello " + name; hello("Bob")""",
expr(_)
).get.value,
Map.empty
)
res120: Value = Str("Hello Bob")
@ evaluate(
fastparse.parse(
"""local greeting = "Hello ";
local person = function (name) {
"name": name,
"welcome": greeting + name + "!"
};
{
"person1": person("Alice"),
"person2": person("Bob"),
"person3": person("Charlie")
}
""",
expr(_)
).get.value,
Map.empty
)
res132: Value = Dict(
Map(
"person1" -> Dict(Map("name" -> Str("Alice"), "welcome" -> Str("Hello Alice!"))),
"person2" -> Dict(Map("name" -> Str("Bob"), "welcome" -> Str("Hello Bob!"))),
"person3" -> Dict(Map("name" -> Str("Charlie"), "welcome" -> Str("Hello Charlie!")))
)
)
The evaluate
function now looks like:
def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match{
case Expr.Str(s) => Value.Str(s)
case Expr.Dict(kvs) => Value.Dict(kvs.map{case (k, v) => (k, evaluate(v, scope))})
case Expr.Plus(items) =>
Value.Str(items.map(evaluate(_, scope)).map{case Value.Str(s) => s}.mkString)
case Expr.Local(name, assigned, body) =>
val assignedValue = evaluate(assigned, scope)
evaluate(body, scope + (name -> assignedValue))
case Expr.Ident(name) => scope(name)
case Expr.Call(expr, args) =>
val Value.Func(call) = evaluate(expr, scope)
val evaluatedArgs = args.map(evaluate(_, scope))
call(evaluatedArgs)
case Expr.Func(argNames, body) =>
Value.Func(args => evaluate(body, scope ++ argNames.zip(args)))
}
In the last two sections, we saw how to parse an input String
of Jsonnet source code into an Expr
syntax tree structure, and then evaluate the Expr
into a Value
data structure that represents a JSON value. In this last section of the tutorial, we will see how to serialize the Value
data structure to an output JSON String
.
The signature of our serialize
function looks like this:
def serialize(v: Value): String = ???
Given a Value
can only be a Value.Str
, Value.Dict
, or Value.Func
, we can expand this into:
def serialize(v: Value): String = v match{
case Value.Str(s) => ???
case Value.Dict(kvs) => ???
}
For now, we will assume there are no longer any Value.Func
nodes in the final data structure, as functions cannot be serialized to JSON. Production implementations of Jsonnet like google/jsonnet or Sjsonnet simply raise an error if an un-called function is present in the output.
Since earlier we assumed there are no escape sequences in our strings, we can serialize Value.Str
nodes by simply adding double quotes "
s around them:
def serialize(v: Value): String = v match{
case Value.Str(s) => "\"" + s + "\""
case Value.Dict(kvs) => ???
}
As for Value.Dict
s, we should serialize each key-value pair as "foo": "bar"
, separated by commas, and wrap the whole thing in curly brackets:
def serialize(v: Value): String = v match{ def serialize(v: Value): String = v match{
case Value.Str(s) => "\"" + s + "\""
case Value.Dict(kvs) =>
kvs.map{case (k, v) => "\"" + k + "\": " + serialize(v)}.mkString("{", ", ", "}")
}
And we can test this with the Jsonnet programs we saw earlier:
@ println(
serialize(
evaluate(
fastparse.parse(
"""{"hello": "world", "key": "value"}""",
expr(_)
).get.value,
Map.empty
)
)
)
{"hello": "world", "key": "value"}
@ println(
serialize(
evaluate(
fastparse.parse(
"""local x = "Hello "; local y = "world"; x + y""",
expr(_)
).get.value,
Map.empty
)
)
)
"Hello world"
@ println(
serialize(
evaluate(
fastparse.parse(
"""local greeting = "Hello ";
local person = function (name) {
"name": name,
"welcome": greeting + name + "!"
};
{
"person1": person("Alice"),
"person2": person("Bob"),
"person3": person("Charlie")
}""",
expr(_)
).get.value,
Map.empty
)
)
)
{
"person1": {"name": "Alice", "welcome": "Hello Alice!"},
"person2": {"name": "Bob", "welcome": "Hello Bob!"},
"person3": {"name": "Charlie", "welcome": "Hello Charlie!"}
}
Now all we need to do is wrap the entire parse -> evaluate -> serialize
pipeline in a nice helper function:
@ def jsonnet(input: String) = {
serialize(evaluate(fastparse.parse(input, expr(_)).get.value, Map.empty))
}
And it's ready for use!
@ println(jsonnet(
"""local greeting = "Hello ";
local person = function (name) {
"name": name,
"welcome": greeting + name + "!"
};
{
"person1": person("Alice"),
"person2": person("Bob"),
"person3": person("Charlie")
}"""
))
{
"person1": {"name": "Alice", "welcome": "Hello Alice!"},
"person2": {"name": "Bob", "welcome": "Hello Bob!"},
"person3": {"name": "Charlie", "welcome": "Hello Charlie!"}
}
In this tutorial, we have walked through how to use Scala to implement an interpreter for a simple programming language: a subset of Jsonnet. We used Fastparse to parse the source code string into an Expr
syntax tree, recursively evaluated the Expr
syntax tree into a Value
data structure, and serialized the Value
data structure to a JSON String
. We have done so in just 65 lines of code, shown below:
sealed trait Expr
object Expr{
case class Str(s: String) extends Expr
case class Ident(name: String) extends Expr
case class Plus(nodes: Seq[Expr]) extends Expr
case class Dict(pairs: Map[String, Expr]) extends Expr
case class Local(name: String, assigned: Expr, body: Expr) extends Expr
case class Func(argNames: Seq[String], body: Expr) extends Expr
case class Call(expr: Expr, args: Seq[Expr]) extends Expr
}
object Parser {
def expr[_: P]: P[Expr] = P( prefixExpr ~ plus.rep ).map{
case (e, Nil) => e
case (e, items) => Expr.Plus(e +: items)
}
def prefixExpr[_: P]: P[Expr] = P( callExpr ~ call.rep ).map{ case (e, items) =>
items.foldLeft(e)((f, args) => Expr.Call(f, args))
}
def callExpr[_: P] = P( str | dict | local | func | ident )
def str[_: P] = P( str0 ).map(Expr.Str)
def str0[_: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" )
def ident[_: P] = P( ident0 ).map(Expr.Ident)
def ident0[_: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!
def dict[_: P] = P( "{" ~/ (str0 ~ ":" ~/ expr).rep(0, ",") ~ "}" ).map(kvs => Expr.Dict(kvs.toMap))
def local[_: P] = P( "local" ~/ ident0 ~ "=" ~ expr ~ ";" ~ expr ).map(Expr.Local.tupled)
def func[_: P] = P( "function" ~/ "(" ~ ident0.rep(0, ",") ~ ")" ~ expr ).map(Expr.Func.tupled)
def plus[_: P] = P( "+" ~ prefixExpr )
def call[_: P] = P( "(" ~/ expr.rep(0, ",") ~ ")" )
}
sealed trait Value
object Value{
case class Str(s: String) extends Value
case class Dict(pairs: Map[String, Value]) extends Value
case class Func(call: Seq[Value] => Value) extends Value
}
def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match{
case Expr.Str(s) => Value.Str(s)
case Expr.Dict(kvs) => Value.Dict(kvs.map{case (k, v) => (k, evaluate(v, scope))})
case Expr.Plus(items) =>
Value.Str(items.map(evaluate(_, scope)).map{case Value.Str(s) => s}.mkString)
case Expr.Local(name, assigned, body) =>
val assignedValue = evaluate(assigned, scope)
evaluate(body, scope + (name -> assignedValue))
case Expr.Ident(name) => scope(name)
case Expr.Call(expr, args) =>
val Value.Func(call) = evaluate(expr, scope)
val evaluatedArgs = args.map(evaluate(_, scope))
call(evaluatedArgs)
case Expr.Func(argNames, body) =>
Value.Func(args => evaluate(body, scope ++ argNames.zip(args)))
}
def serialize(v: Value): String = v match{
case Value.Str(s) => "\"" + s + "\""
case Value.Dict(kvs) =>
kvs.map{case (k, v) => "\"" + k + "\": " + serialize(v)}.mkString("{", ", ", "}")
}
def jsonnet(input: String): String = {
serialize(evaluate(fastparse.parse(input, Parser.expr(_)).get.value, Map.empty))
}
While we did all the work so far in a REPL, you can also paste this code into a Scala Script, or include it in a larger Scala project built using Mill or SBT.
In the interest of time, this tutorial is greatly simplified. Nevertheless, the same techniques apply to implementing most interpreted programming languages, including production-ready implementations like Sjsonnet. Hopefully this post has given you an intuition for how a simple programming language is implemented. Even if you never implement your own programming language in future, these techniques may still come in useful if you find yourself writing linters, program analyzers, query engines, and other such tools.
If you are interested in learning more, here are some simple exercises that extend the interpreter we built above, which you can try out to get a better feel of how this interpreter works:
(function (x) x + x)("hello") // "hellohello"
let my_array = [2, 4, 6, 8];
let index = 4 / 2;
my_array[index + 1] // 6
Add support for strings with escape characters in the parser and serializer, e.g. "hello\nworld"
or "quotes look like \" or \'"
Use Fastparse's Index operator to keep track of the offset of each Expr
node, and show nicer error messages including the line number in the Jsonnet code:
local f = function(x) x + x;
f + f
// Error: binary operator + does not operate on functions.
// example1.jsonnet:2
"hello"("world")
// Error: only functions can be called, got string
// example1.jsonnet:1:1-17
local greeting = "Hello ";
local person = function(name){
"name": name,
"welcome": greeting + name + "!",
};
{
"person1": person("Alice"),
"person2": person("Bob"),
"person3": person("Charlie"),
}
Currently, the serialize
function creates a lot of garbage when concatenating strings to return the final value. Make it instead write its output to a java.lang.StringBuilder
to avoid allocating temporary strings.
So far, our mini-Sjsonnet interpreter lives as a single function def jsonnet
in a Scala program. Write a main method that allows someone to call this program from the command line via mini-jsonnet foo.jsonnet
to evaluate a Jsonnet file, or just mini-jsonnet
to open up an interactive interpreter. This can be done as a Scala Script with shebang line, or via an Mill Executable Assembly
If you're interested in diving deeper into the world of programming language implementation, a good next step would be the following e-book:
About the Author: Haoyi is a software engineer, and the author of many open-source Scala tools such as the Ammonite REPL and the Mill Build Tool. If you enjoyed the contents on this blog, you may also enjoy Haoyi's book Hands-on Scala Programming