Haoyi's Programming Blog

Table of Contents

Automatic Binary Serialization in uPickle 0.7

Posted 2018-11-19
Fastparse 2: Even Faster Scala Parser CombinatorsCompact, Streaming Pretty-Printing of Hierarchical Data

The latest version 0.7 of the uPickle Scala serialization library lets you easily serialize your Scala values to the binary MessagePack format, in addition to the existing JSON serialization format. This gives you the option of compact, high-performance, binary serialization entirely for free, for any value you were previously JSON serializing. This blog post will explore the benefits of binary serialization, and what uPickle brings to the table that's special.


About the Author: Haoyi is a software engineer, and the author of many open-source Scala tools such as the Ammonite REPL and the Mill Build Tool. If you enjoyed the contents on this blog, you may also enjoy Haoyi's book Hands-on Scala Programming


uPickle has always provided an intuitive, performant, boilerplate-free way of serializing your Scala values to JSON strings:

import upickle.default._

write(1)                          ==> "1"

write(Seq(1, 2, 3))               ==> "[1,2,3]"

read[Seq[Int]]("[1,2,3]")         ==> List(1, 2, 3)

write((1, "omg", true))           ==> """[1,"omg",true]"""

read[(Int, String, Boolean)]("""[1,"omg",true]""") ==> (1, "omg", true)

uPickle provides support for the most common data structures that people want to serialize: primitive types (integers, strings, booleans, ...), collections (lists, sets, maps, ...) and user-defined case classes. These should support the vast majority of tree-like data structures that you would want to serialize to the tree-like JSON format. uPickle is among the fastest of the common Scala JSON serialization libraries, and works cross-platform on the JVM, Scala-JS and (soon) Scala-Native.

The latest 0.7.x release of uPickle provides drop-in binary serialization: simply swap out your reads and writes with readBinarys and writeBinarys, and you can serialize all the same data structures to MessagePack formatted byte arrays:

import upickle.default._

writeBinary(1)                          ==> Array(1)

writeBinary(Seq(1, 2, 3))               ==> Array(0x93.toByte, 1, 2, 3)

readBinary[Seq[Int]](Array[Byte](0x93.toByte, 1, 2, 3))  ==> List(1, 2, 3)

val serializedTuple = Array[Byte](0x93.toByte, 1, 0xa3.toByte, 111, 109, 103, 0xc3.toByte)

writeBinary((1, "omg", true))           ==> serializedTuple

readBinary[(Int, String, Boolean)](serializedTuple) ==> (1, "omg", true)

Why Binary Serialization

There are three big wins for binary serialization:

The performance difference between uPickle's binary serialization and it's JSON serialization can be seen in benchmarks (higher number is better):

Library Reads Write
Jackson-Scala 2,080,682 8,905,996
Play Json 1,123,923 1,518,832
Circe 2,172,638 2,057,883
uPickle 3,078,442 4,018,176
uPickle binary 4,907,232 6,812,322

While the difference in size between MessagePack and JSON is less dramatic after compression (e.g. GZip), MessagePack's small uncompressed size reduces the need for spending CPU cycles on compressing/de-compressing messages. And if you do choose to compress them, the smaller MessagePack binaries will take less time to compress than large JSON strings. Both of these are on top of MessagePack's inherent performance advantage.

A Drop-In Replacement

One big benefit of uPickle is that the binary MessagePack format is a drop-in replacement for the JSON format: you can serialize all of the same things you used to be able to serialize with uPickle, just to compact binary data instead of JSON. This includes primitives:

write(1)       ==> "1"
writeBinary(1) ==> Array(1)

Collections:

write(Seq(1, 2, 3))       ==> "[1,2,3]"
read[Seq[Int]]("[1,2,3]") ==> List(1, 2, 3)

writeBinary(Seq(1, 2, 3))                               ==> Array(0x93.toByte, 1, 2, 3)
readBinary[Seq[Int]](Array[Byte](0x93.toByte, 1, 2, 3)) ==> List(1, 2, 3)

Tuples:

write((1, "omg", true))                             ==> """[1,"omg",true]"""
read[(Int, String, Boolean)]("""[1,"omg",true]""")  ==> (1, "omg", true)

val serializedTuple = Array[Byte](0x93.toByte, 1, 0xa3.toByte, 111, 109, 103, 0xc3.toByte)
writeBinary((1, "omg", true))                       ==> serializedTuple
readBinary[(Int, String, Boolean)](serializedTuple) ==> (1, "omg", true)

Or case classes:

case class Thing(a: Int, b: String)
object Thing{
  implicit val rw: ReadWriter[Thing] = macroRW
}

write(Thing(1, "gg")) ==> """{"a":1,"b":"gg"}"""

writeBinary(Thing(1, "gg")) ==> Array[Byte](-126, -95, 97, 1, -95, 98, -94, 103, 103)

There are many subtleties to how serialization works: streaming reading/writing to/from files, handling of defaults, custom serializers, and so forth. With most other libraries, changing to a different serialization format involves swapping out the entire library; this means different syntax for reading/writing, different implicits you need to define, different sets of things that you can and cannot serialize. Changing serialization from JSON to binary has traditionally been a non-trivial endeavor.

With uPickle, it's just a matter of swapping out read/write with readBinary/writeBinary, and everything else can keep working as before!

MessagePack

uPickle's binary serialization uses the standard MessagePack format. This has several advantages compared to rolling your own ad-hoc binary serialization scheme:

While MessagePack isn't perfect, it is a generally reasonable way of serializing JSON-like binary data: quick to encode/decode, space-efficient, and avoids the opacity, confusion, and fragility that tends to befall many hand-crafted binary formats. While not optimized for your specific use case, as a general-purpose binary format it is very likely "good enough".

uPickle's architecture is such that MessagePack isn't blessed: if needed, we could add support for other JSON-like serialization formats such as CBOR, BSON, and so on. Nevertheless, MessagePack is a fine format to come bundled with uPickle as it's default binary serializer.

Conclusion

uPickle's original contribution to the Scala ecosystem was that it made it very, very easy to serialize typical data-structures to simple, predictable JSON: no fancy configuration, no confusing imports, just upickle.default.write and read and you're done.

With uPickle 0.7, this capability has been extended to binary serialization: it is now trivial to serialize any of the common data structures to the common MessagePack binary format. Whether for speeding up your existing JSON RPCs, inter-operating with third party MessagePack-based services, or efficiently storing your Scala binary data on disk or in a database, uPickle's MessagePack backend makes it quick, efficient and predictable. While for more specialized tasks it may still make sense to use a hand-crafted serialization format, for "most" boring use-cases where you need binary serialization you can just use uPickle.

uPickle 0.7.1 has been published to Maven Central, and is already used in the latest versions of Ammonite and Mill. Try it out!


About the Author: Haoyi is a software engineer, and the author of many open-source Scala tools such as the Ammonite REPL and the Mill Build Tool. If you enjoyed the contents on this blog, you may also enjoy Haoyi's book Hands-on Scala Programming


Fastparse 2: Even Faster Scala Parser CombinatorsCompact, Streaming Pretty-Printing of Hierarchical Data