Haoyi's Programming Blog

Table of Contents

How to work with JSON in Scala

Posted 2019-06-26
How to work with HTTP JSON APIs in ScalaHow to create Build Pipelines in Scala

JSON is one of the most common data interchange formats: a human-readable way of exchanging structured data that is ubiquitous throughout industry. This tutorial will walk you through how to work effectively with JSON data in Scala, walking through a few common workflows on a piece of real-world JSON data.


About the Author: Haoyi is a software engineer, and the author of many open-source Scala tools such as the Ammonite REPL and the Mill Build Tool. If you enjoyed the contents on this blog, you may also enjoy Haoyi's book Hands-on Scala Programming


The easiest way to work with JSON is through the uPickle library. This is available on Maven Central for you to use with any version of Scala:

// SBT
"com.lihaoyi" %% "upickle" % "0.7.1"

// Mill
ivy"com.lihaoyi::upickle:0.7.1"

uJson and uPickle also come bundled with Ammonite, and can be used within the REPL and *.sc script files. This tutorial will focus more on walking through a concrete example; for deeper details on the library's syntax and functionality, you can refer to its reference documentation:

To begin with, I will install Ammonite:

$ sudo sh -c '(echo "#!/usr/bin/env sh" && curl -L https://github.com/lihaoyi/Ammonite/releases/download/1.6.8/2.13-1.6.8) > /usr/local/bin/amm && chmod +x /usr/local/bin/amm' && amm

And open the Ammonite REPL, using ujson.<tab> and upickle.<tab> to see the list of available operations:

$ amm
Loading...
Welcome to the Ammonite Repl 1.6.8
(Scala 2.13.0 Java 11.0.2)

@ ujson.
Arr                              IncompleteParseException         StringRenderer
...

@ upickle.
Api                 JsReadWriters       MsgReadWriters      core                implicits
...

Also, download the sample JSON data at

Once this is set up, we are ready to begin the tutorial.

Reading JSON

Given a JSON string:

@ val jsonString = os.read(os.pwd / "ammonite-releases.json")
jsonString: String = """[
  {
    "url": "https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367",
    "assets_url": "https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367/assets",
    "upload_url": "https://uploads.github.com/repos/lihaoyi/Ammonite/releases/17991367/assets{?name,label}",
...

You can read it into a ujson.Value using ujson.read:

@ val data = ujson.read(jsonString)
data: ujson.Value = Arr(
  ArrayBuffer(
    Obj(
      LinkedHashMap(
        "url" -> Str("https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367"),
        "assets_url" -> Str("https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367/assets"),
...

Extracting Values from JSON

You can look up entries in the JSON data structure using data(...) syntax, e.g.

@ data(0)
res3: ujson.Value = Obj(
  LinkedHashMap(
    "url" -> Str("https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367"),
    "assets_url" -> Str("https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367/assets"),
...

@ data(0)("url")
res4: ujson.Value = Str("https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367")

@ data(0)("author")("id")
res5: ujson.Value = Num(2.0607116E7)

A ujson.Value can be one of several types:

sealed trait Value

case class Str(value: String) extends Value

case class Obj(value: mutable.LinkedHashMap[String, Value]) extends Value

case class Arr(value: ArrayBuffer[Value]) extends Value

case class Num(value: Double) extends Value

sealed abstract class Bool extends Value
case object False extends Bool
case object True extends Bool

case object Null extends Value

You can conveniently cast a ujson.Value to a specific sub-type and get its internal data by using the .bool, .num, .arr, .obj, or .str methods:

@ data.
apply      bool       num        render     transform  value
arr        isNull     obj        str        update

For example, fetching the fields of a ujson.Obj:

@ data(0).obj
res6: collection.mutable.LinkedHashMap[String, ujson.Value] = LinkedHashMap(
  "url" -> Str("https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367"),
  "assets_url" -> Str("https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367/assets"),
...

@ data(0).obj.keys
res7: Iterable[String] = Set(
  "url",
  "assets_url",
  "upload_url",
...

@ data(0).obj.size
res8: Int = 18

Or the values in primitive types:

@ data(0)("url").str
res9: String = "https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367"

@ data(0)("author")("id").num
res10: Double = 2.0607116E7

ujson.Nums are stored as doubles. You can call .toInt to convert to an integer:

@ data(0)("author")("id").num.toInt
res11: Int = 20607116

Generating JSON

You can construct JSON data using the ujson.* constructors:

@ val output = ujson.Arr(
    ujson.Obj("hello" -> ujson.Str("world"), "answer" -> ujson.Num(42)),
    ujson.Bool(true)
  )
output: ujson.Arr = Arr(
  ArrayBuffer(Obj(LinkedHashMap("hello" -> Str("world"), "answer" -> Num(42.0))), true)
)

The constructors for primitive types like numbers, strings, and booleans are optional:

@ val output = ujson.Arr(
    ujson.Obj("hello" -> "world", "answer" -> 42),
    true
  )
output: ujson.Arr = Arr(
  ArrayBuffer(Obj(LinkedHashMap("hello" -> Str("world"), "answer" -> Num(42.0))), true)
)

These can be serialized back to a string using the ujson.write function:

@ ujson.write(output)
res13: String = "[{\"hello\":\"world\",\"answer\":42},true]"

@ println(ujson.write(output))
[{"hello":"world","answer":42},true]

By default, the output JSON is compact. You can pass in an indent parameter if you want your output JSON formatted in a human-readable fashion:

@ println(ujson.write(output, indent = 4))
[
    {
        "hello": "world",
        "answer": 42
    },
    true
]

Modifying JSON

ujson.Values are mutable, and can be modified the same way as JSON structures in any other language:

@ println(output)
[{"hello":"world","answer":42},true]

@ output(0)("hello") = "goodbye"

@ output(0)("tags") = ujson.Arr("awesome", "yay", "wonderful")

@ println(output)
[{"hello":"goodbye","answer":42,"tags":["awesome","yay","wonderful"]},true]

When treating ujson.Arrs as buffers or ujson.Objs as maps, you need to use .arr or .obj to cast the value beforehand:

@ println(output)
[{"hello":"goodbye","answer":42,"tags":["awesome","yay","wonderful"]},true]

@ output(0).obj.remove("hello")

@ println(output)
[{"answer":42,"tags":["awesome","yay","wonderful"]},true]
@ output.arr.append(123)

@ println(output)
[{"answer":42,"tags":["awesome","yay","wonderful"]},true,123]

@ output.arr.clear()

@ println(output)
[]

Traversing JSON

Going back to our original data object:

@ ujson.write(data, indent = 4)
res40: String = """[
    {
        "url": "https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367",
        "assets_url": "https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367/assets",
        "upload_url": "https://uploads.github.com/repos/lihaoyi/Ammonite/releases/17991367/assets{?name,label}",
        "html_url": "https://github.com/lihaoyi/Ammonite/releases/tag/1.6.8",
        "id": 17991367,
...

To traverse over the tree structure of the ujson.Value, we can use a recursive function. For example, here is one that recurses over data and collects all the ujson.Str nodes in the JSON structure:

@ def traverse(v: ujson.Value): Iterable[String] = v match{
    case a: ujson.Arr => a.arr.flatMap(traverse)
    case o: ujson.Obj => o.obj.values.flatMap(traverse)
    case s: ujson.Str => Seq(s.str)
    case _ => Nil
  }
defined function traverse

@ traverse(data)
res45: Iterable[String] = ArrayBuffer(
  "https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367",
  "https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367/assets",
  "https://uploads.github.com/repos/lihaoyi/Ammonite/releases/17991367/assets{?name,label}",
  "https://github.com/lihaoyi/Ammonite/releases/tag/1.6.8",
  "MDc6UmVsZWFzZTE3OTkxMzY3",
  "1.6.8",
  "master",

We could also modify the ujson.Value during traversal. Here's a function that recurses over data and removes all key-value pairs where the value is a string starting with https://:

@ def traverse(v: ujson.Value): Boolean = v match{
    case a: ujson.Arr =>
      a.arr.foreach(traverse)
      true
    case o: ujson.Obj =>
      o.obj.filterInPlace{case (k, v) => traverse(v)}
      true
    case s: ujson.Str => !s.str.startsWith("https://")
    case _ => true
  }
@ ujson.write(data, indent = 4)
res52: String = """[
    {
        "id": 17991367,
        "node_id": "MDc6UmVsZWFzZTE3OTkxMzY3",
        "tag_name": "1.6.8",
        "target_commitish": "master",
        "name": "1.6.8",
        "draft": false,
        "author": {
            "login": "Ammonite-Bot",
            "id": 20607116,
            "node_id": "MDQ6VXNlcjIwNjA3MTE2",
...

Converting To/From Scala Data Types

Often you do not just want dynamically-typed JSON trees: you expect the data to conform to a particular schema, have your code use that schema in a safe way, and fail early (and loudly!) if the incoming data doesn't conform. You can do that by defining a case class representing the fields and types you expect to be present in the JSON, and using upickle.default.macroRW and upickle.default.read to extract those fields from the JSON:

@ println(ujson.write(data(0)("author"), indent=4))
{
    "login": "Ammonite-Bot",
    "id": 20607116,
    "node_id": "MDQ6VXNlcjIwNjA3MTE2",
    "gravatar_id": "",
    "type": "User",
    "site_admin": false
}

@ case class Author(login: String, id: Int, site_admin: Boolean)
defined class Author

@ implicit val authorRW = upickle.default.macroRW[Author]

@ val author = upickle.default.read[Author](data(0)("author"))
author: Author = Author("Ammonite-Bot", 20607116, false)

@ author.login
res60: String = "Ammonite-Bot"

@ author.id
res61: Int = 20607116

@ author.site_admin
res62: Boolean = false

Here, the field names in the case class Author correspond to the fields in the JSON that you want to read, and the values are deserialized to the corresponding types (String, Int, Boolean). Extra fields present in the input JSON are ignored. Note that every case class you define must have a corresponding upickle.default.macroRW statement.

If you want a JSON field to deserialize to a case class field of a different name, you can use the @upickle.implicits.key annotation:

@ case class Author(login: String,
                    id: Int,
                    @upickle.implicits.key("site_admin") siteAdmin: Boolean)

@ implicit val authorRW = upickle.default.macroRW[Author]

@ val author = upickle.default.read[Author](data(0)("author"))
author: Author = Author("Ammonite-Bot", 20607116, false)

Your Scala case classes can be converted back into JSON using upickle.default.write:

@ upickle.default.write(author)
res68: String = "{\"login\":\"Ammonite-Bot\",\"id\":20607116,\"site_admin\":false}"

@ println(upickle.default.write(author))
{"login":"Ammonite-Bot","id":20607116,"site_admin":false}

You can also deserialize to Seqs and other builtin data structures. Here we read assets, which is a JSON array of objects, into a Scala Seq[Asset]:

@ ujson.write(data(0)("assets"), indent=4)
res75: String = """[
    {
        "id": 13194960,
        "node_id": "MDEyOlJlbGVhc2VBc3NldDEzMTk0OTYw",
        "name": "2.12-1.6.8",
        "label": "",
        "uploader": {
            "login": "Ammonite-Bot",
            "id": 20607116,
            "node_id": "MDQ6VXNlcjIwNjA3MTE2",
            "gravatar_id": "",
            "type": "User",
            "site_admin": false
        },
        "content_type": "application/octet-stream",
        "state": "uploaded",
        "size": 33951394,
        "download_count": 833,
        "created_at": "2019-06-14T07:54:16Z",
        "updated_at": "2019-06-14T07:54:17Z"
    },
    {
        "id": 13194961,
        "node_id": "MDEyOlJlbGVhc2VBc3NldDEzMTk0OTYx",
...
@ case class Asset(id: Int, name: String)

@ implicit val assetRW = upickle.default.macroRW[Asset]

@ upickle.default.read[Seq[Asset]](data(0)("assets"))
res81: Seq[Asset] = List(
  Asset(13194960, "2.12-1.6.8"),
  Asset(13194961, "2.13-1.6.8"),
  Asset(13199400, "2.12-1.6.8-1-c7a656e"),
  Asset(13199401, "2.13-1.6.8-1-c7a656e"),
  Asset(13220957, "2.12-1.6.8-2-0a2abd6"),
...

You can also deserialize nested case classes:

@ case class Uploader(id: Int, login: String, `type`: String)

@ case class Asset(id: Int, name: String, uploader: Uploader)

@ implicit val uploaderRW = upickle.default.macroRW[Uploader]

@ implicit val assetRW = upickle.default.macroRW[Asset]

@ val assets = upickle.default.read[Seq[Asset]](data(0)("assets"))
assets: Seq[Asset] = List(
  Asset(13194960, "2.12-1.6.8", Uploader(20607116, "Ammonite-Bot", "User")),
  Asset(13194961, "2.13-1.6.8", Uploader(20607116, "Ammonite-Bot", "User")),
  Asset(13199400, "2.12-1.6.8-1-c7a656e", Uploader(20607116, "Ammonite-Bot", "User")),
  Asset(13199401, "2.13-1.6.8-1-c7a656e", Uploader(20607116, "Ammonite-Bot", "User")),
  Asset(13220957, "2.12-1.6.8-2-0a2abd6", Uploader(20607116, "Ammonite-Bot", "User")),
  Asset(13220958, "2.13-1.6.8-2-0a2abd6", Uploader(20607116, "Ammonite-Bot", "User")),
...

If you wish to store a dynamically typed JSON field within your case class, simply label it as ujson.Value:

@ case class Asset(id: Int, name: String, uploader: ujson.Value)

@ implicit val assetRW = upickle.default.macroRW[Asset]

@ val assets = upickle.default.read[Seq[Asset]](data(0)("assets"))
assets: Seq[Asset] = List(
  Asset(
    13194960,
    "2.12-1.6.8",
    Obj(
      LinkedHashMap(
        "login" -> Str("Ammonite-Bot"),
        "id" -> Num(2.0607116E7),
        "node_id" -> Str("MDQ6VXNlcjIwNjA3MTE2"),
        "gravatar_id" -> Str(""),
        "type" -> Str("User"),
        "site_admin" -> false
      )
    )
  ),
...

@ println(assets(0).uploader)
{"login":"Ammonite-Bot","id":20607116,"node_id":"MDQ6VXNlcjIwNjA3MTE2","gravatar_id":"","type":"User","site_admin":false}

@ println(assets(0).uploader.obj.keys)
Set(login, id, node_id, gravatar_id, type, site_admin)

Lastly, all our Scala data types can be converted back to JSON strings using upickle.default.write:

@ upickle.default.write(assets, indent=4)
res90: String = """[
    {
        "id": 13194960,
        "name": "2.12-1.6.8",
        "uploader": {
            "id": 20607116,
            "login": "Ammonite-Bot",
            "type": "User"
        }
    },
    {
        "id": 13194961,
        "name": "2.13-1.6.8",
        "uploader": {
            "id": 20607116,
...

While the above examples all demonstrated using upickle.default.read to read ujson.Values into typed Scala case classes, you can also use it to efficiently read raw Strings into case classes without the overhead of intermediate data structures.

Conclusion

This tutorial introduces you to the basics of working with JSON in a Scala program, using the uPickle library. We have walked through common workflows: reading JSON data, extracting values of interest from it, generating and modifying our own JSON data, traversing the JSON tree structure, and finally serializing and de-serializing typed Scala case classes from the untyped JSON data.

This post only covers the basics of the uPickle JSON library. You can refer to the reference documentation for more details:


About the Author: Haoyi is a software engineer, and the author of many open-source Scala tools such as the Ammonite REPL and the Mill Build Tool. If you enjoyed the contents on this blog, you may also enjoy Haoyi's book Hands-on Scala Programming


How to work with HTTP JSON APIs in ScalaHow to create Build Pipelines in Scala

Updated 2019-06-26 2019-06-26