Golden Literal Testing in uTest 0.9.0

Posted 2025-07-30

12 years of the com.lihaoyi Scala Platform

uTest is a small unit testing library I maintain that aims for simplicity and convenience. This blog post explores the Golden testing feature newly-added in uTest 0.9.0: why it is necessary, what it does, and how it works internally. This feature was inspired by the Jane Street blog post What if writing tests was a joyful experience.

About the Author: Haoyi is a software engineer, and the author of many open-source Scala tools such as the Ammonite REPL and the Mill Build Tool. If you enjoyed the contents on this blog, you may also enjoy Haoyi's book Hands-on Scala Programming

The Motivation for Golden Tests

Golden testing, also called Snapshot testing, compares the output of your code against some pre-defined "Golden" value. What differentiates golden tests from normal unit tests is that the golden values are often relatively large, and instead of being written and maintained by hand they are generated and updated automatically by the testing framework. For example, you may want to check that the logs of a particular workflow "look right", and you would like to check that they do not change unexpectedly, but you don't want to spend time typing out an entire log file by hand!

For example, consider the FullRunLogsTests in the Mill build tool. These tests run some simple commands and assert against the shape of the output logs, with the dual goal of ensuring all the "expected" logging is present, and no unwanted debug messages appear in the output. Traditionally, they would be written something like this:

val res = eval("run", "--text", "hello")

val normalized = normalize(res)
assert(
  normalized ==
  List(
    "============================== run  <dashes>text hello ==============================",
    "[build.mill-<digits>/<digits>] compile",
    "[build.mill-<digits>] [info] compiling <digits> Scala sources to .../out/mill-build/compile.dest/classes ...",
    "[build.mill-<digits>] [info] done compiling",
    "[<digits>/<digits>] compile",
    "[<digits>] [info] compiling <digits> Java source to .../out/compile.dest/classes ...",
    "[<digits>] [info] done compiling",
    "[<digits>/<digits>] run",
    "[<digits>/<digits>] ============================== run  <dashes>text hello ============================== <digits>s"
  )
)

In this case the value we're asserting is a List of Strings, so putting the expected output in a file is possible. But asserting against other kids of literal data structure is also common: primitives, Tuples, Lists, Maps, case classes, or any combination of these nested within each other. These may be less convenient to move to a separate file, and keeping them in-line in your test code also helps avoid indirection forcing you to jump around from file-to-file just to figure out what your test is doing.

While it is possible to manage these tests by hand, it can be quite tedious. When setting them up, typically a user would first use pprint.log from the PPrint library to print out the value:

val res = eval("run", "--text", "hello")

val normalized = normalize(res)
pprint.log(normalized)

Unlike normal Java .toString, PPrint is optimized for outputting well-formatted output that can be copy-pasted into your code: Lists are split across multiple lines and indented, Strings are indented, etc. The pprint.log call above would output:

FullRunLogsTests.scala:19 normalized: List(
  "============================== run  <dashes>text hello ==============================",
  "[build.mill-<digits>/<digits>] compile",
  "[build.mill-<digits>] [info] compiling <digits> Scala sources to .../out/mill-build/compile.dest/classes ...",
  "[build.mill-<digits>] [info] done compiling",
  "[<digits>/<digits>] compile",
  "[<digits>] [info] compiling <digits> Java source to .../out/compile.dest/classes ...",
  "[<digits>] [info] done compiling",
  "[<digits>/<digits>] run",
  "[<digits>/<digits>] ============================== run  <dashes>text hello ============================== <digits>s"
)

And the user would then copy-paste it into the test code to use in the assert.

However, it is not just setup that can be tedious: maintaining these tests as the behavior of your system evolves over time is tedious as well. For example, maybe we decide to replace the square [...]s with parentheses (...). That would cause the test to fail:

utest.AssertionError: normalized == ...
normalized: List[String] = List(
  "============================== run  <dashes>text hello ==============================",
  "(build.mill-<digits>/<digits>) compile",
  "[build.mill-<digits>] [info] compiling <digits> Scala sources to .../out/mill-build/compile.dest/classes ...",
  "[build.mill-<digits>] [info] done compiling",
  "(<digits>/<digits>) compile",
  "[<digits>] [info] compiling <digits> Java source to .../out/compile.dest/classes ...",
  "[<digits>] [info] done compiling",
  "(<digits>/<digits>) run",
  "[<digits>/<digits>] ============================== run  <dashes>text hello ============================== <digits>s"
)
normalized != ...:
  List(
    "============================== run  <dashes>text hello ==============================",
-   "(build.mill-<digits>/<digits>) compile",
+   "[build.mill-<digits>/<digits>] compile",
    "[build.mill-<digits>] [info] compiling <digits> Scala sources to .../out/mill-build/compile.dest/classes ...",
    "[build.mill-<digits>] [info] done compiling",
-   "(<digits>/<digits>) compile",
+   "[<digits>/<digits>] compile",
    "[<digits>] [info] compiling <digits> Java source to .../out/compile.dest/classes ...",
    "[<digits>] [info] done compiling",
-   "(<digits>/<digits>) run",
+   "[<digits>/<digits>] run",
    "[<digits>/<digits>] ============================== run  <dashes>text hello ============================== <digits>s"
  )

And the user would have to copy-paste the new normalized value into their assertion to make the test pass. Again, this isn't rocket science, but it can be very tedious: updating a large test suite with lots of tests to comply with updated output isn't fun, and isn't a good use of a 4 year university computer-science degree and decades of industry experience.

Setting Up uTest Golden Literal Testing

uTest 0.9.0 ships with a new assertGoldenLiteral method. To set this up the first time, you can call it with the runtime value on the left, and a dummy value () on the right:

val res = eval("run", "--text", "hello")

val normalized = normalize(res)
assertGoldenLiteral(
  normalized,
  ()
)

Running this test produces the following assertion error:

If you then run the test again with UTEST_UPDATE_GOLDEN_TESTS=1, you will see that uTest has recognized the mismatch, and gone and updated your FullRunLogsTests.scala source file on your behalf!

+ mill.integration.FullRunLogsTests.ticker 8970ms  
UTEST_UPDATE_GOLDEN_TESTS detected, uTest applying 1 golden fixes to file /Users/lihaoyi/Github/mill/integration/feature/full-run-logs/src/FullRunLogsTests.scala
Updating line:column 46:8 to 46:10
Tests: 1, Passed: 1, Failed: 0

$ git diff
diff --git a/integration/feature/full-run-logs/src/FullRunLogsTests.scala b/integration/feature/full-run-logs/src/FullRunLogsTests.scala
index 654ac4ac0f7..9c80dafd0e7 100644
--- a/integration/feature/full-run-logs/src/FullRunLogsTests.scala
+++ b/integration/feature/full-run-logs/src/FullRunLogsTests.scala
@@ -44,7 +44,17 @@ object FullRunLogsTests extends UtestIntegrationTestSuite {
       val normalized = normalize(res)
       assertGoldenLiteral(
         normalized,
-        ()
+        List(
+          "============================== run  <dashes>text hello ==============================",
+          "[build.mill-<digits>/<digits>] compile",
+          "[build.mill-<digits>] [info] compiling <digits> Scala sources to .../out/mill-build/compile.dest/classes ...",
+          "[build.mill-<digits>] [info] done compiling",
+          "[<digits>/<digits>] compile",
+          "[<digits>] [info] compiling <digits> Java source to .../out/compile.dest/classes ...",
+          "[<digits>] [info] done compiling",
+          "[<digits>/<digits>] run",
+          "[<digits>/<digits>] ============================== run  <dashes>text hello ============================== <digits>s"
+        )
       )
     }
     test("keepGoingFailure") - integrationTest { tester =>

If you run the test again now, you will find it passes given the new literal that has been spliced in by UTEST_UPDATE_GOLDEN_TESTS=1.

In fact, you do not need to set up your asserts one at a time: you can write an entire test suite with multiple assertGoldenLiteral calls, leave them all stubbed out with (), and run the test with UTEST_UPDATE_GOLDEN_TESTS=1 to fill them in all at once!

Maintaining Golden Literal Tests

Earlier we mentioned that apart from setting up tests, keeping them up-to-date is also tedious. With uTest's golden literal tests, if you then make a behavioral change like substituting the square brackets [...] with round parentheses (...), assertGoldenLiteral is able to recognize the difference and highlight the lines and characters that differ:

And if run again with UTEST_UPDATE_GOLDEN_TESTS=1, uTest fix it on your behalf in the source code, updating the data structure in your test to the new value:

+ mill.integration.FullRunLogsTests.ticker 8527ms  
UTEST_UPDATE_GOLDEN_TESTS detected, uTest applying 1 golden fixes to file /Users/lihaoyi/Github/mill/integration/feature/full-run-logs/src/FullRunLogsTests.scala
Updating line:column 46:8 to 56:9
Tests: 1, Passed: 1, Failed: 0

$ git diff
diff --git a/integration/feature/full-run-logs/src/FullRunLogsTests.scala b/integration/feature/full-run-logs/src/FullRunLogsTests.scala
index 9c80dafd0e7..3698f8980a9 100644
--- a/integration/feature/full-run-logs/src/FullRunLogsTests.scala
+++ b/integration/feature/full-run-logs/src/FullRunLogsTests.scala
@@ -46,13 +46,13 @@ object FullRunLogsTests extends UtestIntegrationTestSuite {
         normalized,
         List(
           "============================== run  <dashes>text hello ==============================",
-          "[build.mill-<digits>/<digits>] compile",
+          "(build.mill-<digits>/<digits>) compile",
           "[build.mill-<digits>] [info] compiling <digits> Scala sources to .../out/mill-build/compile.dest/classes ...",
           "[build.mill-<digits>] [info] done compiling",
-          "[<digits>/<digits>] compile",
+          "(<digits>/<digits>) compile",
           "[<digits>] [info] compiling <digits> Java source to .../out/compile.dest/classes ...",
           "[<digits>] [info] done compiling",
-          "[<digits>/<digits>] run",
+          "(<digits>/<digits>) run",
           "[<digits>/<digits>] ============================== run  <dashes>text hello ============================== <digits>s"
         )
       )

Effectively, for the subset of simple asserts that assertGoldenLiteral is suitable for, it greatly reduces the busy-work of writing and maintaining a test suite. Rather than tediously writing out the expected output yourself, and endlessly tweaking it as the behavior evolves, you can instead just ask uTest to update the expected output on your behalf and it will update your source code appropriately!

Implementation Details

How uTest's assertGoldenLiteral works is itself interesting, and worth a mention. It has the following signature:

def assertGoldenLiteral(actualValue: Any, goldenLiteral: GoldenFix.Span[Any])
                       (implicit reporter: GoldenFix.Reporter): Unit

assertGoldenLiteral works with Anys; Scala (and it's underlying Java or Javascript runtimes) relies on "Universal Equality" where any value can be compared to any other value, and uTest does not make any innovations in that regard. The actualValue: Any isn't particularly interesting, but the right-hand value goldenLiteral: GoldenFix.Span[Any] is where the magic happens. GoldenFix.Span is similar to the sourcecode.Text type from the com-lihaoyi/sourcecode library, and is defined as:

class Span[+T](value: T, sourceFile: String, startOffset: Int, endOffset: Int)

Where sourcecode.Text captures just the textual contents of the expression, GoldenFix.Span captures the raw source file path and start/end offsets within it. Like sourcecode.Text, GoldenFix.Span is typically constructed from any value via an implicit macro conversion to turn the Any into a GoldenFix.Span[Any]. For example, when you write

assertGoldenLiteral(
  normalized,
  ()
)

() is not a GoldenFix.Span, and so the implicit conversion GoldenFix.Span.generate expands that into:

assertGoldenLiteral(
  normalized,
  GoldenFix.Span.generate(())
)

generate is a macro, which then expands into

assertGoldenLiteral(
  normalized,
  GoldenFix.Span(
     (), 
     sourceFile = "integration/feature/full-run-logs/src/FullRunLogsTests.scala", 
     startOffset = 1546, 
     endOffset = 1548
  )
)

This way, assertGoldenLiteral is able to capture exactly where the goldenLiteral: GoldenFix.Span[Any] expression comes from in the source code on disk. We now have all the information we need to find the original source file, find where the goldenValue expression is inside of it, and replace it with our new value when necessary.

To make use of the GoldenFix.Span, assertGoldenLiteral is defined as:

def assertGoldenLiteral(actualValue: Any, goldenLiteral: GoldenFix.Span[Any])
                       (implicit reporter: GoldenFix.Reporter): Unit = {
   
  if (actualValue != goldenLiteral.value) {
    if (!sys.env.contains("UTEST_UPDATE_GOLDEN_TESTS") {
      throwAssertionError(goldenLiteral.sourceFile, goldenLiteral.value, actualValue)
    } else {
      reporter.apply(actualValue, goldenLiteral)
    }
  }
}

When the two values actualValue and goldenLiteral.value are not equal and UTEST_UPDATE_GOLDEN_TESTS is not given, we throw an assertion error as normal. But when UTEST_UPDATE_GOLDEN_TESTS is given, we do not throw and instead simply pass the actualValue and goldenLiteral to the implicit reporter: GoldenFix.Reporter that is passed automatically from the test suite. This reporter then knows:

Which assertGoldenLiteral calls failed
What file each one was in and exactly where in that file the goldenLiteral is from
What the actualValue was

uTest can then pretty-print the actual value using the PPrint library, indent it appropriately based on the indentation of the original expression, and splice it into the source code where the original expression came from. Because PPrint is designed to provide source-equivalent and well-formatted output for the given expression, the updated code can now be compiled and run with that assertGoldenLiteral check passing!

Conclusion

assertGoldenLiteral only works for asserting equality with "literals". These are values that can be pretty-printed using the PPrint library, typically primitives, collections, and case classes. It cannot totally replace all existing usages of assert, assertThrows, assertCompileError, etc.

However, in the cases where assertGoldenLiteral (or its sister-method assertGoldenFile) can be applied, it works surprisingly well, cutting down on the busy-work involved in keeping your test suite up-to-date, allowing you to spend less time on busy-work and more time on the problem at hand.

The reason assertGoldenLiteral works so well is that it is basically what users were doing by hand anyway:

Users were already running tests and seeing them fail
Users were already using pprint.log to print out a copy-paste-able version of the thing they want to assert against
Users were already splicing in the PPrint output into the source code where the original literal was, and fixing up the indentation as necessary.

Lots of testing frameworks have some equivalent to uTest's assertGoldenFile, which can automatically update a file on disk when it detects a mismatch. But uTest 0.9.0's new assertGoldenLiteral takes that one step further, and is able to over-write data literals in your source code when it detects a mismatch. That is something that is pretty uncommon among testing frameworks across all languages, and I hope some of you will try it out in the new version of uTest!