Fork me on GitHub

FastParse 0.4.3


Fast to write, Fast running Parsers in Scala

import fastparse.all._

val number: P[Int] = P( CharIn('0'to'9').rep(1).!.map(_.toInt) )
val parens: P[Int] = P( "(" ~/ addSub ~ ")" )
val factor: P[Int] = P( number | parens )

val divMul: P[Int] = P( factor ~ (CharIn("*/").! ~/ factor).rep ).map(eval)
val addSub: P[Int] = P( divMul ~ (CharIn("+-").! ~/ divMul).rep ).map(eval)
val expr: P[Int]   = P( addSub ~ End )
val Parsed.Success(2, _) = expr.parse("1+1")
val Parsed.Success(15, _) = expr.parse("(1+1*2)+3*4")
val Parsed.Success(21, _) = expr.parse("((1+1*2)+(3*4*5))/3")
val Parsed.Failure(expected, failIndex, extra) = expr.parse("1+1*")
assert(expected == (number | parens), failIndex == 4)

FastParse is a parser-combinator library for Scala that lets you quickly and easily write recursive descent text- and binary data parsers in Scala. Features include:

FastParse is a project by Li Haoyi. If you use FastParse and enjoyed it, please chip in to support our development at https://www.patreon.com/lihaoyi.

The following sections will introduce you to FastParse and how to use it. You can also watch this talk:

Which will give you a quick 1-hour tour of how to use FastParse, the motivation behind the library, and how it fits into the bigger picture of how programmers parse unstructured text.

Getting Started


To begin using FastParse, add

"com.lihaoyi" %% "fastparse" % "0.4.3"

To your SBT configuration. To use with Scala.js, you'll need

"com.lihaoyi" %%% "fastparse" % "0.4.3"

Writing Parsers


Basic

The simplest parser matches a single string:

import fastparse.all._
val parseA = P( "a" )

val Parsed.Success(value, successIndex) = parseA.parse("a")
assert(value == (), successIndex == 1)

val failure = parseA.parse("b").asInstanceOf[Parsed.Failure]
assert(
  failure.lastParser == ("a": P0),
  failure.index == 0,
  failure.extra.traced.trace == """parseA:1:1 / "a":1:1 ..."b""""
)

Such a parser returns a Parsed.Success if the input matches the string, and otherwise returns a Parsed.Failure.

As you can see, by default the Parsed.Success contains a (): Unit, unless you use Capture or Map described below. Also, apart from the structured data of the failure, Parsed.Failure also contains a nice human-readable trace of the parse showing the stack of parsers which were in progress when the parse failed. The trace can be obtained via Parsed.Failure.Extra.trace.

You can also wrap the strings in an IgnoreCase("...") if you want the matching to be case-insensitive.

The P(...) lets you write mutually recursive parsers, without running into circular initialization problems, and gives the parser a nice, human-readable name (from the val it is being assigned to) which appears in parse error messages. In general, every time you assign a parser to a val, you should wrap it in P(...).

Sequence

val ab = P( "a" ~ "b" )

val Parsed.Success(_, 2) = ab.parse("ab")

val Parsed.Failure(parser, 1, _) = ab.parse("aa")
assert(parser == ("b": P0))

You can combine two parsers with the ~ operator. This creates a new parser that only succeeds if both left and right parsers succeed one after another.

Repeat

val ab = P( "a".rep ~ "b" )
val Parsed.Success(_, 8) = ab.parse("aaaaaaab")
val Parsed.Success(_, 4) = ab.parse("aaaba")

val abc = P( "a".rep(sep="b") ~ "c")
val Parsed.Success(_, 8) = abc.parse("abababac")
val Parsed.Failure(parser, 3, _) = abc.parse("abaabac")

val ab4 = P ( "a".rep(min=2, max=4, sep="b") )
val Parsed.Success(_, 7) = ab4.parse("ababababababa")

val ab2exactly = P( "ab".rep(exactly=2) )
val Parsed.Success(_, 4) = ab2exactly.parse("abab")

val ab4c = P ( "a".rep(min=2, max=4, sep="b") ~ "c" )
val Parsed.Failure(_, 1, _) = ab4c.parse("ac")
val Parsed.Success(_, 4) = ab4c.parse("abac")
val Parsed.Success(_, 8) = ab4c.parse("abababac")
val Parsed.Failure(_, 7, _) = ab4c.parse("ababababac")

The .rep method creates a new parser that attempts to parse the given parser zero or more times. If you want to parse something a given number of times, you can use .rep(min = 2, max = 4) o r the shorter .rep(1) for one or more times, in addition there is exactly parameter that if it's defined behaves like min and max equals to it. You can optionally provide an argument which acts as a separator between the usages of the original parser, such as a comma in .rep(sep = ",").

Optional

val option = P( "c".? ~ "a".rep(sep="b").! ~ End)

val Parsed.Success("aba", 3) = option.parse("aba")
val Parsed.Success("aba", 3) = option.parse("aba")

Similar to .rep is .?, which creates a new parser that attempts to parse the given parser zero or 1 times.

Either

val either = P( "a".rep ~ ("b" | "c" | "d") ~ End)

val Parsed.Success(_, 6) = either.parse("aaaaab")
val Parsed.Failure(parser, 5, _) = either.parse("aaaaae")
assert(parser == ("b" | "c" | "d"))

The | operator tries the parser on the left, and if that fails, tries the one on the right, failing only if both parsers fail.

End, Start

val noEnd = P( "a".rep ~ "b")
val withEnd = P( "a".rep ~ "b" ~ End)

val Parsed.Success(_, 4) = noEnd.parse("aaaba")
val Parsed.Failure(End, 4, _) = withEnd.parse("aaaba")

The End parser only succeeds if at the end of the input string. By default, a Parser does not need to consume the whole input, and can succeed early consuming a portion of it (exactly how much input was consumed is stored in the Success#index attribute). By using End, we can make the parse fail if it doesn't consume everything

There is also a similar Start parser, which only succeeds at the start of the input

val ab = P( (("a" | Start) ~ "b").rep ~ End).!

val Parsed.Success("abab", 4) = ab.parse("abab")
val Parsed.Success("babab", 5) = ab.parse("babab")

val Parsed.Failure(parser, 2, _) = ab.parse("abb")

Pass, Fail

val Parsed.Success((), 0) = Pass.parse("asdad")
val Parsed.Failure(Fail, 0, _) = Fail.parse("asdad")

These two parsers always succeed, or always fail, respectively. Neither consumes any input.

Index

val finder = P( "hay".rep ~ Index ~ "needle" ~ "hay".rep )

val Parsed.Success(9, _) = finder.parse("hayhayhayneedlehay")

Always succeeds, and provides the current index of the parse into the input string. e.g. useful for providing source locations for AST nodes. Consumes no input.

Capture

val capture1 = P( "a".rep.! ~ "b" ~ End)

val Parsed.Success("aaa", 4) = capture1.parse("aaab")

val capture2 = P( "a".rep.! ~ "b".! ~ End)

val Parsed.Success(("aaa", "b"), 4) = capture2.parse("aaab")

val capture3 = P( "a".rep.! ~ "b".! ~ "c".! ~ End)

val Parsed.Success(("aaa", "b", "c"), 5) = capture3.parse("aaabc")

val captureRep = P( "a".!.rep ~ "b" ~ End)

val Parsed.Success(Seq("a", "a", "a"), 4) = captureRep.parse("aaab")

val captureOpt = P( "a".rep ~ "b".!.? ~ End)

val Parsed.Success(Some("b"), 4) = captureOpt.parse("aaab")

So far, all the parsers go over the input text but do not return any useful value: the Success#value attribute was always (). In order to make them do so, you use the .! operation to capture the section of the input string the parser parsed.

Note the types of each parser:

In general, if you have a parser of type TupleN, capturing one more section turns it into a TupleN+1. Furthermore, if you capture within a .rep or .? optional parser, it becomes a Parser[Seq[T]] or Parser[Option[T]] respectively

AnyChar

See also AnyElem in Abstraction
val ab = P( "'" ~ AnyChar.! ~ "'" )

val Parsed.Success("-", 3) = ab.parse("'-'")

val Parsed.Failure(parser, 2, _) = ab.parse("'-='")
assert(parser == ("'": P0))

This parser parses any single character successfully. It almost always succeeds, except if there simply aren't any characters left to parse.

There is also a plural AnyChars(count: Int) version that parses exactly count characters in a row, regardless of what they are.

Positive Lookahead

val keyword = P( ("hello" ~ &(" ")).!.rep )

val Parsed.Success(Seq("hello"), _) = keyword.parse("hello ")
val Parsed.Success(Seq(), __) = keyword.parse("helloX")

The &(...) operator wraps a parser, only succeeds if it succeeds, but consumes no input. Useful for doing checks like "these characters must be followed by a whitespace, but don't consume the whitespace"

Negative Lookahead

val keyword = P( "hello" ~ !" " ~ AnyChar ~ "world" ).!

val Parsed.Success("hello-world", _) = keyword.parse("hello-world")
val Parsed.Success("hello_world", _) = keyword.parse("hello_world")

val Parsed.Failure(parser, 6, _) = keyword.parse("hello world")
assert(parser == !(" "))

The !... operator wraps a parser and only succeeds if it fails, also consuming no input. Useful to combine with other parsers like AnyChar to restrict the things that they can parse.

Map

val binary = P( ("0" | "1" ).rep.! )
val binaryNum = P( binary.map(Integer.parseInt(_, 2)) )

val Parsed.Success("1100", _) = binary.parse("1100")
val Parsed.Success(12, _) = binaryNum.parse("1100")

Up till now, we've only dealt with

.map lets you convert an arbitrary Parser[T] into a Parser[V] by providing a T => V function. This is useful for converting the strings and tuples/seqs/options of strings into more useful data-structures.

FlatMap

val leftTag = P( "<" ~ (!">" ~ AnyChar).rep(1).! ~ ">" )
def rightTag(s: String) = P( "</" ~ s.! ~ ">" )
val xml = P( leftTag.flatMap(rightTag) )

val Parsed.Success("a", _) = xml.parse("<a></a>")
val Parsed.Success("abcde", _) = xml.parse("<abcde></abcde>")

val failure = xml.parse("<abcde></edcba>").asInstanceOf[Parsed.Failure]
assert(
  failure.extra.traced.trace == """xml:1:1 / rightTag:1:8 / "abcde":1:10 ..."edcba>""""
)

.flatMap allows you to dynamically choose a parser to continue with, given the result of the current parser. The example above uses it to parse balanced XML tags. .flatMap can be used to parse indentation-based grammars, and is used to do so in Scalatex.

Note that the function given to .flatMap is evaluated every time this parser is tried. You should be conscious of the cost of re-creating the resultant parser every time, since FastParse parsers are somewhat expensive to initialize despite being fast per-run. If possible, store the parsers somewhere before-hand or memo-ize/cache them to avoid initializing them wastefully.

As is normal in Scala, you can replace a .flatMap call with a for-comprehension, as below:

val leftTag = P( "<" ~ (!">" ~ AnyChar).rep(1).! ~ ">" )
def rightTag(s: String) = P( "</" ~ s.! ~ ">" )
val xml = P(
  for{
    s <- leftTag
    right <- rightTag(s)
  } yield right
)

val Parsed.Success("a", _) = xml.parse("<a></a>")
val Parsed.Success("abcde", _) = xml.parse("<abcde></abcde>")

val failure = xml.parse("<abcde></edcba>").asInstanceOf[Parsed.Failure]
assert(
  failure.extra.traced.trace == """xml:1:1 / rightTag:1:8 / "abcde":1:10 ..."edcba>""""
)

Which is equivalent and behaves exactly the same.

Filter

val digits = P(CharIn('0' to '9').rep(1).!).map(_.toInt)
val even = digits.filter(_ % 2 == 0)
val Parsed.Success(12, _) = even.parse("12")
val failure = even.parse("123").asInstanceOf[Parsed.Failure]
assert("""digits.filter\(.*\)$""".r.findPrefixOf(even.toString).isDefined)
assert("""digits.filter\(.*\):1:1 ..."123"$""".r.findPrefixOf(failure.extra.traced.trace).isDefined)

.filter allows you to supply a predicate T => Boolean which is applied to the successful result of the current parser. If the predicate is true the filtered parser succeeds otherwise it fails. The example above uses .filter on digits to parse only even numbers successfully while odd numbers will fail. If the current parser fails then that failure is simply passed along.

Opaque

Sometimes it's useful to hide parser's implementation details and provide a higher-level error message. .opaque achieves exactly that.

val digit = CharIn('0' to '9')
val letter = CharIn('A' to 'Z')
def twice[T](p: Parser[T]) = p ~ p
def errorMessage[T](p: Parser[T], str: String) =
  ParseError(p.parse(str).asInstanceOf[Parsed.Failure]).getMessage

// Portuguese number plate format since 2006
val numberPlate = P(twice(digit) ~ "-" ~ twice(letter) ~ "-" ~ twice(digit))

assert(errorMessage(numberPlate, "11-A1-22") == """
  |found "1-22", expected CharIn("ABCDEFGHIJKLMNOPQRSTUVWXYZ") at index 4
  |11-A1-22
  |    ^""".stripMargin.trim)

// Suppress implementation details from the error message
val opaqueNumberPlate = numberPlate.opaque("<number-plate>")

assert(errorMessage(opaqueNumberPlate, "11-A1-22") == """
  |found "11-A1-22", expected <number-plate> at index 0
  |11-A1-22
  |^""".stripMargin.trim)

.opaque wraps the target parser in an Opaque combinator, which only succeeds or fails as a single entity and leaves no traces of underlying parsers on the stack.

Log

val logged = mutable.Buffer.empty[String]
implicit val logger = Logger(logged.append(_))

val DeepFailure = P( "C" )
val Foo = P( (DeepFailure.log() | "A".log()) ~ "B".!.log() ).log()

Foo.parse("AB")

val allLogged = logged.mkString("\n")

val expected =
  """+Foo:1:1
    |  +DeepFailure:1:1
    |  -DeepFailure:1:1:Failure(DeepFailure:1:1 / "C":1:1 ..."AB")
    |  +"A":1:1
    |  -"A":1:1:Success(1:2)
    |  +"B":1:2
    |  -"B":1:2:Success(1:3)
    |-Foo:1:1:Success(1:3)
    |
  """.stripMargin.trim
assert(allLogged == expected)

Debugging Parsers is often done with the .log() method, which logs output whenever the parser is tried, and when it succeeds or fails, together with the location and other data when these things happen (traces on failures, results on successes, the presence of Cuts, ...). You can define custom loggers as we've done here, or you can just leave it to by default print to stdout.

Generally, if a parser is doing something wrong, the workflow is:

It's a non-trivial process, but it is generally not hard to figure out what's happening this way.

Intrinsics

In theory, all possible parsers can be put together using the above tools. In practice, a few more tools are provided for convenience or performance:

CharPred

See also ElemPred in Abstraction
val cp = P( CharPred(_.isUpper).rep.! ~ "." ~ End )

val Parsed.Success("ABC", _) = cp.parse("ABC.")
val Parsed.Failure(_, 2, _) = cp.parse("ABc.")

CharPred takes a Char => Boolean predicate and creates a parser that parses any single character that satisfies that predicate. e.g. you can use any of the helpful methods on scala.Char to check if a Char isUpper, isDigit, isControl, etc. etc.

Note that this builds a high-performance bit-packed lookup table, the size of the range of valid characters, up to 65k. That means that creating a parser like this has a one-time cost in terms of memory (up to 8k bytes) and time. This should not matter as FastParse parsers are long-lived and re-usable, though you may want to consciously avoid creating too many of these repeatedly.

You can use CharPred.raw if you want to avoid pre-computing the lookup table, in which case the predicate is re-run repeatedly during parsing.

val cp = P( CharPred.raw(_.isUpper).rep.! ~ "." ~ End )

val Parsed.Success("ABC", _) = cp.parse("ABC.")
val Parsed.Failure(_, 2, _) = cp.parse("ABc.")

This could be useful if you find your parsers are taking too long to initialize and don't mind sacrificing a bit of steady-state performance.

CharIn

See also ElemIn in Abstraction
val ci = P( CharIn("abc", "xyz").rep.! ~ End )

val Parsed.Success("aaabbccxyz", _) = ci.parse("aaabbccxyz")
val Parsed.Failure(_, 7, _) = ci.parse("aaabbccdxyz.")

val digits = P( CharIn('0' to '9').rep.! )

val Parsed.Success("12345", _) = digits.parse("12345abcde")
val Parsed.Success("123", _) = digits.parse("123abcde45")

Similar to CharPred, except you pass in sequences of valid characters rather than a predicate.

CharIn also builds a lookup table which it uses during parsing. As a result, it's much faster to execute than if you had used "a" | "b" | "c" | "d" | ... to combine a bunch of single-character parsers together. The same warning as CharPred about the one time cost-of-construction applies.

CharsWhile

See also ElemsWhile in Abstraction
val cw = P( CharsWhile(_ != ' ').! )

val Parsed.Success("12345", _) = cw.parse("12345")
val Parsed.Success("123", _) = cw.parse("123 45")

A repeated version of CharPred: this parser continually chomps away at characters as long as they continue passes the given predicate.

This is a very fast parser, ideal for quickly consuming large numbers of characters. The same warning as CharPred about the one time cost-of-construction applies.

You can use CharsWhile.raw if you want to avoid pre-computing the lookup table, in which case the predicate is re-run repeatedly during parsing.

val cw = P( CharsWhile.raw(_ != ' ').! )

val Parsed.Success("12345", _) = cw.parse("12345")
val Parsed.Success("123", _) = cw.parse("123 45")

This could be useful if you find your parsers are taking too long to initialize and don't mind sacrificing a bit of steady-state performance.

CharsWhileIn

See also ElemsWhileIn in Abstraction
val cw = P( CharsWhileIn("123456789").! )

val Parsed.Success("12345", _) = cw.parse("12345")
val Parsed.Success("123", _) = cw.parse("123 45")

A combination of CharsWhile and CharIn, this parser continues consuming characters as long as they are within the set you characters you passed to it.

If you find yourself using CharsWhile with contains e.g. CharsWhile("abc".contains), you probably could use CharsWhileIn instead.

StringIn

See also SeqIn in Abstraction
val si = P( StringIn("cow", "cattle").!.rep )

val Parsed.Success(Seq("cow", "cattle"), _) = si.parse("cowcattle")
val Parsed.Success(Seq("cow"), _) = si.parse("cowmoo")

Quickly parses one of any number of strings that you give it. Behind the scenes, it converts the list of strings into a Trie so it can attempt to parse all of them in a single pass.

As a result, this is much faster to execute than if you had combined the individual strings with "cow" | "cattle" | ....

There is also a StringInIgnoreCase parser you can use if you want to match things case-insensitively.

Cuts

A "cut" (~/ in Fastparse) is a marker in a recursive-descent parser that states "you cannot backtrack past this point". In FastParse, this serves two purposes:

No Cuts

val alpha = P( CharIn('a' to 'z') )
val nocut = P( "val " ~ alpha.rep(1).! | "def " ~ alpha.rep(1).!)

val Parsed.Success("abcd", _) = nocut.parse("val abcd")

val failure = nocut.parse("val 1234").asInstanceOf[Parsed.Failure]
assert(
  failure.index == 0,
  failure.extra.traced.trace ==
  """nocut:1:1 / ("val " ~ alpha.rep(1) | "def " ~ alpha.rep(1)):1:1 ..."val 1234""""
)

Above we have a naive scala definition parser: it either parses a val or def, a space, and its (lower-case only) name. On a success this works as expected, and extracts the name. However, on a failure, something odd happens: the deepest parser on-failure is shown to be the entire Either, rather than just the alpha that came after "val ". Why is that?

By default, the parse has an opportunity to backtrack whenever it enters a

e.g. in the case of p1 | p2, if it tries to parse p1 and fails, it then tries to parse p2. If that fails, all that FastParse knows is that one of them should have succeeded. Specifically, FastParse does not know that after successfully parsing "val ", that only the left branch of the Either is viable! Thus it has no choice but to offer both alternatives in the error message.

Cuts

val alpha = P( CharIn('a' to 'z') )
val nocut = P( "val " ~/ alpha.rep(1).! | "def " ~/ alpha.rep(1).!)

val Parsed.Success("abcd", _) = nocut.parse("val abcd")

val failure = nocut.parse("val 1234").asInstanceOf[Parsed.Failure]
assert(
  failure.index == 4,
  failure.extra.traced.trace ==
  """nocut:1:1 / alpha:1:5 / CharIn("abcdefghijklmnopqrstuvwxyz"):1:5 ..."1234""""
)

Cuts are added using the ~/ operator, which is similar to the Sequence operator ~. Once the parse has crossed a cut, it can no longer backtrack past the point at which the cut occured. Hence, in this case you can see that it no longer backtracks to index 0, out of the enclosing Either parser and offering that in the error trace. Instead, it shows a much more precise error: at index 4, expecting one of the small set of alphanumeric characters.

In general, if you know that a parser is "committed" to one branch after parsing to a certain point, adding a cut will greatly improve the error message by ensuring that the parser itself knows that. Good places to add cuts include places like after keywords in a programming language parser, where a keyword can be followed by only one thing and anything else is an error.

Rep Cuts

val alpha = P( CharIn('a' to 'z') )
val stmt = P( "val " ~ alpha.rep(1).! ~ ";" ~ " ".rep )
val stmts = P( stmt.rep(1) ~ End )

val Parsed.Success(Seq("abcd"), _) = stmts.parse("val abcd;")
val Parsed.Success(Seq("abcd", "efg"), _) = stmts.parse("val abcd; val efg;")

val failure = stmts.parse("val abcd; val ").asInstanceOf[Parsed.Failure]
assert(
  failure.index == 10,
  failure.extra.traced.trace == """stmts:1:1 / (End | " "):1:11 ..."val """"
)

A similar problem occurs inside Repeat or Optional parsers, where the parser will give up and backtrack out if it fails, even if it really should succeed. Again, adding cuts would result in a more precise error message:

val alpha = P( CharIn('a' to 'z') )
val stmt = P( "val " ~/ alpha.rep(1).! ~ ";" ~ " ".rep )
val stmts = P( stmt.rep(1) ~ End )

val Parsed.Success(Seq("abcd"), _) = stmts.parse("val abcd;")
val Parsed.Success(Seq("abcd", "efg"), _) = stmts.parse("val abcd; val efg;")

val failure = stmts.parse("val abcd; val ").asInstanceOf[Parsed.Failure]
assert(
  failure.index == 14,
  failure.extra.traced.trace ==
    """stmts:1:1 / stmt:1:11 / alpha:1:15 / CharIn("abcdefghijklmnopqrstuvwxyz"):1:15 ..."""""
)

Another case where you may want to pay attention is when you are using delimiters with your .rep calls:

val digits = P( CharIn('0' to '9').rep(1) )
val tuple = P( "(" ~ digits.!.rep(sep=",") ~ ")" )

val Parsed.Success(Seq("1", "23"), _) = tuple.parse("(1,23)")

val failure = tuple.parse("(1,)").asInstanceOf[Parsed.Failure]
assert(
  failure.index == 2,
  failure.extra.traced.trace == """tuple:1:1 / (")" | CharIn("0123456789")):1:3 ...",)""""
)

in many (but not all!) cases, if a delimiter is parsed, you want to commit to parsing one more iteration of the Repeat. However, by default, it backtracks out of the Repeat entirely and starts trying to parse the next item in sequence (in this case the ")" giving the behavior shown above.

With a cut, the error is improved:

val digits = P( CharIn('0' to '9').rep(1) )
val tuple = P( "(" ~ digits.!.rep(sep="," ~/ Pass) ~ ")" )

val Parsed.Success(Seq("1", "23"), _) = tuple.parse("(1,23)")

val failure = tuple.parse("(1,)").asInstanceOf[Parsed.Failure]
assert(
  failure.index == 3,
  failure.extra.traced.trace == """tuple:1:1 / digits:1:4 / CharIn("0123456789"):1:4 ...")""""
)

The ~/ operator can be used without following parser as a shortcut for ~/ Pass. Compare the previous example with the following one:

val digits = P( CharIn('0' to '9').rep(1) )
val tuple = P( "(" ~ digits.!.rep(sep=",".~/) ~ ")" )

val Parsed.Success(Seq("1", "23"), _) = tuple.parse("(1,23)")

val failure = tuple.parse("(1,)").asInstanceOf[Parsed.Failure]
val trace = failure.extra.traced.trace
assert(
  failure.index == 3,
  trace == """tuple:1:1 / digits:1:4 / CharIn("0123456789"):1:4 ...")""""
)

Isolating Cuts

Because cuts prevent backtracking throughout the entire parser, they make it difficult to compose arbitrary parsers:

val digit = P( CharIn('0' to '9') )
val time1 = P( ("1".? ~ digit) ~ ":" ~/ digit ~ digit ~ ("am" | "pm") )
val time2 = P( (("1" | "2").? ~ digit) ~ ":" ~/ digit ~ digit )
val Parsed.Success((), _) = time1.parse("12:30pm")
val Parsed.Success((), _) = time2.parse("17:45")
val time = P( time1 | time2 )
val Parsed.Success((), _) = time.parse("12:30pm")
val failure = time.parse("17:45").asInstanceOf[Parsed.Failure]
assert(failure.index == 5)  // Expects am or pm

In the above case, time1 and time2 are arbitrary parsers containing Cuts. By default, that means that once you've crossed a cut, you can no longer backtrack. However, there are cases where you want to use an existing parser (for example time1) in a situation where you want to allow it to backtrack, but you don't want to rewrite it identically but without cuts. In this case it's trivial, but if time1 was larger you would need to rewrite all of it as well as all of its transitive sub-parsers to make sure that not a single one had a cut inside!

To explicitly isolate a cut to one branch of a parser, place that branch within NoCut. Cuts within that branch will prevent backtracking inside that branch, but if that branch fails alternate branches will be tried as normal.

val digit = P( CharIn('0' to '9') )
val time1 = P( ("1".? ~ digit) ~ ":" ~/ digit ~ digit ~ ("am" | "pm") )
val time2 = P( (("1" | "2").? ~ digit) ~ ":" ~/ digit ~ digit )
val Parsed.Success((), _) = time1.parse("12:30pm")
val Parsed.Success((), _) = time2.parse("17:45")
val time = P( NoCut(time1) | time2 )
val Parsed.Success((), _) = time.parse("12:30pm")
val Parsed.Success((), _) = time.parse("17:45")

Unapply

Parser class has a convenient method unapply which allows to do pattern matching using parsers (like regular expressions) in your code.

val capture1 = P( "a".rep.! ~ "b" ~ End)

val Parsed.Success("aaa", 4) = capture1.parse("aaab")

val capture2 = P( "a".rep.! ~ "b".! ~ End)

val Parsed.Success(("aaa", "b"), 4) = capture2.parse("aaab")

val capture3 = P( "a".rep.! ~ "b".! ~ "c".! ~ End)

val Parsed.Success(("aaa", "b", "c"), 5) = capture3.parse("aaabc")

val captureRep = P( "a".!.rep ~ "b" ~ End)

val Parsed.Success(Seq("a", "a", "a"), 4) = captureRep.parse("aaab")

If the parser succeeds, the return value of the parser is bound to the name in the pattern match. If the parser fails, the pattern match fails. Note you do not have access to the failure details when pattern-matching-syntax/unapply with a parser: if you need that, call .parse and pattern match on the Parsing Results yourself

Streaming Parsing


In addition to the parsing strings, you can also parse "streaming" data from Iterators. To do so, call .parseIterator instead of .parse in your parser and pass the Iterator[String] (or Iterator[Array[Byte]] for Byte Parsers).

import fastparse.all._
val p = P( "ab" ~/ "cd".rep().! ~ "ef" | "z" )

val Parsed.Success(res, i) = p.parseIterator(
  Iterator("ab", "cd", "cd", "cd", "ef")
)

assert(res == "cdcdcd")

Note that fastparse does not parse Iterator[Char] or Iterator[Byte]s for performance reasons: most input sources make data available in chunks, such as network packets or lines from file on the disk. By parsing chunks, FastParse better matches any underlying data source, and itself has better performance parsing larger chunks.

Streaming parsing still needs to buffer input in-memory: in particular, parsers like Optional, Repeat or Either means a parser may backtrack, and thus FastParse needs to buffer any input from where such a parsers starts parsing. Other parsers like Capture do not backtrack, but need to buffer data in order to return the data that gets captured. Using Cuts to prevent backtracking, apart from making Debugging Parsers easier, also allows FastParse to flush parts of the buffer that it no longer needs to backtrack into.

In general every cut in your parser possibly reduces the memory used to buffer input for iterator parsing

Streaming Parsing Buffer Size

This first benchmark measures the maximum size of buffered input when using streaming parsing, for some sample parsers we have in the test suite, for input-chunks of size 1 and 1024:

Parser Maximum buffer
for 1-sized chunk
Maximum buffer
for 1024-sized chunk
Size of input Used memory
ScalaParse155525231478941.4%
PythonParse20062867685583.6%
BmpParse3610267864860.01%
ClassParse47613713321420.3%

As you can see, for these "typical" parsers, some input needs to be buffered to allow backtracking, but it turns out to be only a few percent of the total file size.

These parsers make heavy use of backtracking operators like Either or Repeat, but also make heavy use of Cuts. This lets FastParse drop buffered input when it knows it can no longer backtrack.

Another thing to note is the chunk size: a smaller chunk size reduces the granularity of the chunks that get buffered, reducing the buffer size. However, this comes at a performance cost, as you can see below...

Streaming Parsing Performance

This next benchmark measures the effect of streaming parsing on runtime performance, using two different chunk-sizes, compared to the performance of non-streaming parsing:

Parser Score on the plain parsing Score on the iterator parsing
for 1-sized chunk
Score on the iterator parsing
for 1024-sized chunk
ScalaParse433343
PythonParse1150600890
BmpParse1951540
ClassParse16040100

Here, we can see that streaming parsing has a non-trivial effect on performance: ScalaParse seems unaffected by a chunks of size 1024, and takes a 25% performance hit for chunks of size 1, but PythonParse takes a significant hit (25%, 47%) and ClassParse and BmpParse have it much worse. While smaller chunk sizes results in smaller buffers, it also comes with a performance cost. Exactly how big you want your input chunks to be is up to you to decide: FastParse will accept an iterator of chunks as large or as small as you want.

In general, Streaming Parsing it always going to be a performance hit over parsing a single String you pre-loaded in memory. The point of streaming parsing is to handle cases where you can't/don't-want-to load everything in memory. In that case, if the choice is between slightly-slower parsing or an OutOfMemory error, streaming parsing is a good option to have.

Streaming Parsing Limitations

Apart from the performance/memory tradeoff mentioned above, streaming parsing has some limitations that it is worth being aware of:

Example Parsers


Above, we've already covered all the individual bits and pieces that make writing a parser possible. But how does that fit together? Let's take a look at some examples.

Math

import fastparse.all._

val number: P[Int] = P( CharIn('0'to'9').rep(1).!.map(_.toInt) )
val parens: P[Int] = P( "(" ~/ addSub ~ ")" )
val factor: P[Int] = P( number | parens )

val divMul: P[Int] = P( factor ~ (CharIn("*/").! ~/ factor).rep ).map(eval)
val addSub: P[Int] = P( divMul ~ (CharIn("+-").! ~/ divMul).rep ).map(eval)
val expr: P[Int]   = P( addSub ~ End )

This is a small arithmetic expression parser, the same one shown at the top of this page. It parses only whole integers, parentheses, +-*/, and no whitespace.

Things to note:

def eval(tree: (Int, Seq[(String, Int)])) = {
  val (base, ops) = tree
  ops.foldLeft(base){ case (left, (op, right)) => op match{
    case "+" => left + right case "-" => left - right
    case "*" => left * right case "/" => left / right
  }}
}

This is a small example, but it works. We check it to verify that every parse results in the expected integer:

val Parsed.Success(2, _) = expr.parse("1+1")
val Parsed.Success(15, _) = expr.parse("(1+1*2)+3*4")
val Parsed.Success(21, _) = expr.parse("((1+1*2)+(3*4*5))/3")
val Parsed.Failure(expected, failIndex, extra) = expr.parse("1+1*")
assert(expected == (number | parens), failIndex == 4)

Try it out yourself! Remember that it does not handle whitespace:

Whitespace Handling

val White = WhitespaceApi.Wrapper{
  import fastparse.all._
  NoTrace(" ".rep)
}
import fastparse.noApi._
import White._
def eval(tree: (Int, Seq[(String, Int)])): Int = {
  val (base, ops) = tree
  ops.foldLeft(base){ case (left, (op, right)) => op match{
    case "+" => left + right case "-" => left - right
    case "*" => left * right case "/" => left / right
  }}
}
val number: P[Int] = P( CharIn('0'to'9').rep(1).!.map(_.toInt) )
val parens: P[Int] = P( "(" ~/ addSub ~ ")" )
val factor: P[Int] = P( number | parens )

val divMul: P[Int] = P( factor ~ (CharIn("*/").! ~/ factor).rep ).map(eval)
val addSub: P[Int] = P( divMul ~ (CharIn("+-").! ~/ divMul).rep ).map(eval)
val expr: P[Int]   = P( " ".rep ~ addSub ~ " ".rep ~ End )

To handle whitespace and other non-significant characters with FastParse, use the WhitespaceApi as a substitue for the normal API that is provided for parsers. This modifies the ~ and .rep operators to consume all non-trailing whitespace and ignoring it.

Note how you can pass in whatever definition of whitespace you want: here we're passing in a simple " ".rep, but in a more sophisticated parser you may wish to include tabs, newlines, comments or even nested comments. The whitespace parser can be arbitrarily complex.

Note also how we're importing from fastparse.noApi instead of fastparse.all, and then substituting it with our one whitespace-consuming parser API. Other than that, the parser is identical except for added whitespace parsers at the start and end of the expr rule.

Here it is in action:

def check(str: String, num: Int) = {
  val Parsed.Success(value, _) = expr.parse(str)
  assert(value == num)
}

* - check("1+1", 2)
* - check("1+   1*   2", 3)
* - check("(1+   1  *  2)+(   3*4*5)", 63)
* - check("15/3", 5)
* - check("63  /3", 21)
* - check("(1+    1*2)+(3      *4*5)/20", 6)
* - check("((1+      1*2)+(3*4*5))/3", 21)

Or try it yourself:

Indentation Grammars

def eval(tree: (String, Seq[Int])) = tree match{
  case ("+", nums) => nums.reduceLeft(_+_)
  case ("-", nums) => nums.reduceLeft(_-_)
  case ("*", nums) => nums.reduceLeft(_*_)
  case ("/", nums) => nums.reduceLeft(_/_)
}

/**
 * Parser for an indentation-based math syntax. Parens are no longer
 * necessary, and the whole parser is parametrized with the current
 * depth of indentation
 */
class Parser(indent: Int){
  val number: P[Int] = P( CharIn('0'to'9').rep(1).!.map(_.toInt) )

  val deeper: P[Int] = P( " ".rep(indent + 1).!.map(_.length) )
  val blockBody: P[Seq[Int]] = "\n" ~ deeper.flatMap(i =>
    new Parser(indent = i).factor.rep(1, sep = ("\n" + " " * i).~/)
  )
  val block: P[Int] = P( CharIn("+-*/").! ~/ blockBody).map(eval)

  val factor: P[Int] = P( number | block )

  val expr: P[Int]   = P( block ~ End )
}
val expr = new Parser(indent = 0).expr

Here is a grammar that is used to parse a simple indentation-based math grammar. To understand the grammar it is trying to parse, it is worth looking at the test data:

def check(str: String, num: Int) = {
  val Parsed.Success(value, _) = expr.parse(str)
  assert(value == num)
}

check(
  """+
    |  1
    |  1
  """.stripMargin.trim,
  2
)
check(
  """+
    |  1
    |  *
    |    1
    |    2
  """.stripMargin.trim,
  3
)

As you can see, it is basically a prefix math evaluator, where you use indentation to pass the numbers or expressions to each operator to operate on.

As for the parser, the novel things are:

Note how there is no pre-processing, and no lexining phase where the lexer has to guess where in the token stream to inject synthetic indent and dedent tokens, Everything happens in a single pass.

Try it out!

Json

// Here is the parser
val StringChars = NamedFunction(!"\"\\".contains(_: Char), "StringChars")

val space         = P( CharsWhileIn(" \r\n").? )
val digits        = P( CharsWhileIn("0123456789"))
val exponent      = P( CharIn("eE") ~ CharIn("+-").? ~ digits )
val fractional    = P( "." ~ digits )
val integral      = P( "0" | CharIn('1' to '9') ~ digits.? )

val number = P( CharIn("+-").? ~ integral ~ fractional.? ~ exponent.? ).!.map(
  x => Js.Num(x.toDouble)
)

val `null`        = P( "null" ).map(_ => Js.Null)
val `false`       = P( "false" ).map(_ => Js.False)
val `true`        = P( "true" ).map(_ => Js.True)

val hexDigit      = P( CharIn('0'to'9', 'a'to'f', 'A'to'F') )
val unicodeEscape = P( "u" ~ hexDigit ~ hexDigit ~ hexDigit ~ hexDigit )
val escape        = P( "\\" ~ (CharIn("\"/\\bfnrt") | unicodeEscape) )

val strChars = P( CharsWhile(StringChars) )
val string =
  P( space ~ "\"" ~/ (strChars | escape).rep.! ~ "\"").map(Js.Str)

val array =
  P( "[" ~/ jsonExpr.rep(sep=",".~/) ~ space ~ "]").map(Js.Arr(_:_*))

val pair = P( string.map(_.value) ~/ ":" ~/ jsonExpr )

val obj =
  P( "{" ~/ pair.rep(sep=",".~/) ~ space ~ "}").map(Js.Obj(_:_*))

val jsonExpr: P[Js.Val] = P(
  space ~ (obj | array | string | `true` | `false` | `null` | number) ~ space
)

This is a somewhat larger example than the math parser shown above. In it, we parse a JSON expression from a string, including all the proper handling for whitespace and error-handling built in.

Things to note:

object Js {
  sealed trait Val extends Any {
    def value: Any
    def apply(i: Int): Val = this.asInstanceOf[Arr].value(i)
    def apply(s: java.lang.String): Val =
      this.asInstanceOf[Obj].value.find(_._1 == s).get._2
  }
  case class Str(value: java.lang.String) extends AnyVal with Val
  case class Obj(value: (java.lang.String, Val)*) extends AnyVal with Val
  case class Arr(value: Val*) extends AnyVal with Val
  case class Num(value: Double) extends AnyVal with Val
  case object False extends Val{
    def value = false
  }
  case object True extends Val{
    def value = true
  }
  case object Null extends Val{
    def value = null
  }
}

case class NamedFunction[T, V](f: T => V, name: String) extends (T => V){
  def apply(t: T) = f(t)
  override def toString() = name

}

We can verify that this parser builds the JSON tree that we expect:

val Parsed.Success(value, _) = jsonExpr.parse(
  """{"omg": "123", "wtf": 12.4123}"""
)
assert(value == Js.Obj("omg" -> Js.Str("123"), "wtf" -> Js.Num(12.4123)))

And that it provides good error messages in the case of mal-formed JSON, even for moderately-sized fragemnts

{
    "firstName": "John",
    "lastName": "Smith",
    "age": 25,
    "address": {
        "streetAddress": "21 2nd Street",
        "city": "New York",
        "state": "NY",
        "postalCode": 10021
    },
    "phoneNumbers":
        {
            "type": "home",
            "number": "212 555-1234"
        },
        {
            "type": "fax",
            "number": "646 555-4567"
        }
    ]
}
jsonExpr:1:1 / obj:2:9 / pair:16:19 / string:16:19 / "\"":17:17 ..."{\n        "

Here, we're missing a square bracket after the "phoneNumbers" key, and so the parser expects to find a single JSON expression. It finds a JSON object, and then fails reporting that it expected to find the next key (a string), but instead found "{\n" at that index.

Try it out!

ScalaParse

ScalaParse is a parser for the entire Scala programming language, written using FastParse. This is notable for a few reasons:

ScalaParse does not currently generate an AST. As you can see, the parse result above is listed as undefined. However, that does not make it useless! Even without generating an AST, ScalaParse can be used to:

Using ScalaParse

To begin using ScalaParse, add
"com.lihaoyi" %% "scalaparse" % "0.4.3"

To your SBT configuration. To use with Scala.js, you'll need

"com.lihaoyi" %%% "scalaparse" % "0.4.3"

PythonParse

There is now an example Python parser available under a subproject in the repo. This is a good example of a real-world parser: parsing knotty syntax (including indentation-delimited blocks!), building an AST, and with heavy unit tests.

PythonParse is currently compatible enough to parse all the python sources in Zulip, Ansible, Changes, Django, and Flask. It isn't published yet on maven central, but feel free to look at it if you want an idea of how to write a complex, real parser.

"com.lihaoyi" %%% "pythonparse" % "0.4.3"

CssParse

CssParse is a parser that parses CSS files into an abstract syntax tree (AST). The implementation is too large to show in-line, but can be found here:

This AST can then be used for a variety of reasons: you could analyze the CSS to try and find bugs, transform the CSS in some custom way (e.g. prefixing class-names with the name of the file) or just re-formatting the CSS when printing it back out.

CssParse compiles to Javascript via Scala.js, and we have a demo here that demonstrates the use of CssParse as a CSS pretty-printer. Enter some CSS in the box below, no matter how it's formatted or minified, and CssParse will add the necessary spaces and tabs to make the file readable and good-looking.

As mentioned above, CssParse builds and AST that stores information about tags and rules in the given CSS, this AST isn't complete, because of complexity of initial CSS format, but it provides all the essential information about basic elements of file (blocks, selectors, rules). The correctness of CssParse is tested by parsing and then printing several huge files including CSS from Bootstrap and Primer.

This is available on Maven Central as

"com.lihaoyi" %%% "cssparse" % "0.4.3"

API Highlights


Parser

Fastparse revolves around Parsers objects: a parser that can attempt to parse a value T from an input sequence of elements of type Elem. The Repr type-parameter is responsible for output type in Capture, since input is converted to the IndexedSeq[Elem] or Iterator[IndexedSeq[Elem]] during all parsing operations.

There are two main cases: for string parser, you are looking at Parser[T, Char, String]. For Byte Parsers, you would be dealing with Parser[T, Byte, Bytes]

These are defined as:

/**
 * A single, self-contained, immutable parser. The primary method is
 * `parse`, which returns a [[T]] on success and a stack trace on failure.
 *
 * Some small optimizations are performed in-line: collapsing
 * [[parsers.Combinators.Either]] cells into large ones and collapsing
 * [[parsers.Combinators.Sequence]] cells into
 * [[parsers.Combinators.Sequence.Flat]]s. These optimizations together appear
 * to make things faster but any 10%, whether or not you activate tracing.
 */
abstract class Parser[+T, Elem, Repr]()(implicit val reprOps: ReprOps[Elem, Repr])
  extends ParserResults[T, Elem, Repr] with Precedence{

  /**
    * Can be passed into a `.parse` call to let you inject logic around
    * the parsing of top-level parsers, e.g. for logging and debugging.
    */
  type InstrumentCallback = (
    (Parser[_, Elem, Repr], Int, () => Parsed[_, Elem, Repr]) => Unit
  )

  /**
   * Parses the given `input` starting from the given `index`
   *
   * @param input The string we want to parse
   *
   * @param index The index in the string to start from. By default parsing
   *              starts from the beginning of a string, but you can start
   *              from halfway through the string if you want.
   *
   * @param instrument Allows you to pass in a callback that will get called
   *                   by every named rule, its index, as it itself given a
   *                   callback that can be used to recurse into the parse and
   *                   return the result. Very useful for extracting auxiliary
   *                   information from the parse, e.g. counting rule
   *                   invocations to locate bottlenecks or unwanted
   *                   backtracking in the parser.
   */
  def parse(input: Repr,
            index: Int = 0,
            instrument: InstrumentCallback = null)
      : Parsed[T, Elem, Repr] = {
    parseInput(IndexedParserInput(input), index, instrument)
  }

  def parseIterator(input: Iterator[Repr],
                    index: Int = 0,
                    instrument: InstrumentCallback = null)
                   (implicit ct: ClassTag[Elem])
      : Parsed[T, Elem, Repr] = {
    parseInput(IteratorParserInput(input), index, instrument)
  }

  def parseInput(input: ParserInput[Elem, Repr],
                 index: Int = 0,
                 instrument: InstrumentCallback = null)

      : Parsed[T, Elem, Repr] = {
    parseRec(
      new ParseCtx(input, 0, -1, this, index, instrument, false, false, false),
      index
    ).toResult
  }

  /**
   * Extractor for pattern matching
   *
   *  For example:
   *
   *  {{{
   *  val p1 = CharIn('a'to'z').! ~ CharIn('0'to'9').rep(1).!.map(_.toInt)
   *  List("a42x") match {
   *    case p1(x: String, y: Int) :: _ => // x is "a", y is 42
   *  }
   * }}}
   */
  def unapply(input: Repr): Option[T] = {
    parse(input) match {
      case Parsed.Success(r, _) => Some(r)
      case _ => None
    }
  }

  /**
   * Parses the given `input` starting from the given `index` and `logDepth`
   */
  def parseRec(cfg: ParseCtx[Elem, Repr], index: Int): Mutable[T, Elem, Repr]

  /**
   * Whether or not this parser should show up when
   * [[Parsed.TracedFailure.trace]] is called. If not set, the parser will
    * only show up in [[Parsed.TracedFailure.fullStack]]
   */
  def shortTraced: Boolean = false

  /**
   * Whether or not to surround this parser with parentheses when printing.
   * By default a top-level parser is always left without parentheses, but
   * if a sub-parser is embedded inside with lower precedence, it will be
   * surrounded. Set to `Integer.MaxValue` to never be parenthesized
   */
  def opPred: Int = Precedence.Max
}

Typically, you will be dealing with the aliased version of this inside import fastparse.all._:

type Parsed[+T] = core.Parsed[T, String]
type Parser[+T] = core.Parser[T, Char, String]

Or if you're writing Byte Parsers:

type Parsed[+T] = core.Parsed[T, Array[Byte]]
type Parser[+T] = core.Parser[T, Byte, Array[Byte]]

The main external API is .parse for parsing regular arrays of data and .parseIterator for parsing streaming data. (See also Streaming Parsing). As you can see, apart from the input parameter, there are a few parameters that you can use to configure the parse. Apart from that, each Parser[T, Elem, Repr] needs to implement parseRec which is a less-convenient but more-performant version that FastParse uses internally when performing a parse.

This class also supports an Unapply method, which can be used in Scala match expressions.

Although the core of Parser is simple, a lot of additional functionality is included in the ParserApi[T, Elem, Repr] trait in order to make constructing parsers convenient and concise.

ParserApi

Apart from the core Parser, FastParse includes a large set of operations that you can perform on a Parser to make composing them more pleasant. These all live in ParserApi:

abstract class ParserApi[+T, Elem, Repr]()(implicit repr: ReprOps[Elem, Repr]) {

  /**
   * Wraps this in a [[Logged]]. This prints out information
   * where a parser was tried and its result, which is useful for debugging
   */
  def log(msg: String = toString)(implicit out: Logger): Parser[T, Elem, Repr]

  /**
   * Makes this parser opaque, i.e. hides it and its inner parsers
   * from the stack trace, providing the specified message instead.
   */
  def opaque(msg: String = toString): Parser[T, Elem, Repr]

  /**
   * Repeats this parser 0 or more times
   */
  def rep[R](implicit ev: Repeater[T, R]): Parser[R, Elem, Repr]
  def rep[R](min: Int = 0,
             sep: Parser[_, Elem, Repr] = Pass[Elem, Repr],
             max: Int = Int.MaxValue,
             exactly: Int = -1)
            (implicit ev: Repeater[T, R]): Parser[R, Elem, Repr]

  /**
   * Parses using this or the parser `p`
   */
  def |[V >: T](p: Parser[V, Elem, Repr]): Parser[V, Elem, Repr]

  /**
   * Parses using this followed by the parser `p`
   */
  def ~[V, R](p: Parser[V, Elem, Repr])
             (implicit ev: Sequencer[T, V, R]): Parser[R, Elem, Repr]

  /**
   * Parses using this followed by the parser `p`, performing a Cut if
   * this parses successfully. That means that if `p` fails to parse, the
   * parse will fail immediately and not backtrack past this success.
   *
   * This lets you greatly narrow the error position by avoiding unwanted
   * backtracking.
   */
  def ~/[V, R](p: Parser[V, Elem, Repr])
              (implicit ev: Sequencer[T, V, R]): Parser[R, Elem, Repr]

  /**
   * Performs a cut if this parses successfully.
   */
  def ~/ : Parser[T, Elem, Repr]

  /**
   * Parses this, optionally
   */
  def ?[R](implicit ev: Optioner[T, R]): Parser[R, Elem, Repr]

  /**
   * Wraps this in a [[Not]] for negative lookaheak
   */
  def unary_! : Parser[Unit, Elem, Repr]

  /**
   * Used to capture the text parsed by this as a `String`
   */
  def ! : Parser[Repr, Elem, Repr]

  /**
   * Transforms the result of this Parser with the given function
   */
  def map[V](f: T => V): Parser[V, Elem, Repr]

  /**
   * Uses the result of this parser to create another parser that
   * will be used for the next parse
   */
  def flatMap[V](f: T => Parser[V, Elem, Repr]): Parser[V, Elem, Repr]

  /**
   * applies the supplied predicate to the current parser succeeding
     on true failing on false
   */
  def filter(predicate: T => Boolean): Parser[T, Elem, Repr]

  /**
   * alias for `filter`
   */
  final def withFilter(predicate: T => Boolean): Parser[T, Elem, Repr] = filter(predicate)
}

There are essentially all short-hand constructors for the parsers in the object Parser companion. This is the list of operators that you have available when writing your own parsers using FastParse.

As mentioned in Whitespace Handling, you can choose to ignore the default set of operators by using import fastparse.noApi instead of import fastparse.all. That way you can use your own set of operators, e.g. the whitespace-sensitive operators described in that section.

Parsing Results

The result of a parser comes in two flavors of Parsed; the first is a success (Parsed.Success) and the second is a failure (Parsed.Failure). Parsed.Success provides the parsed value - the value you are probably most interested in - and the index in the input string till where the parse was performed. Parsed.Failure allows you to retrieve the last parser that failed and the index where it failed. Additionally, failure provides an Parsed.Failure.extra field that provides precise details about the failure, in particular, and most importantly a complete stack trace of the involved parsers, which is accessible via extra.traced.

The recommended method for dealing with Parsed is to use fold which accepts two callbacks. The first callback deals with a failed parse attempt and has a type signature of (Parser[_], Int, Failure.Extra) => X. Each input parameter corresponds to what is available in Parsed.Failure. The second callback deals with a successful parsing and has type signature (T, Int) => X where T is the parsed result and the Int is the index of the string where the parsing was performed.

sealed trait AndOr
case object And extends AndOr
case object Or extends AndOr
val and = P(IgnoreCase("And")).map(_ => And)
val or = P(IgnoreCase("Or")).map(_ => Or)
val andOr = P(and | or)

def check(input: String, expectedOutput: String) =
  assert(andOr.parse(input).fold((_, _, _) => s"Cannot parse $input as an AndOr", (v, _) => s"Parsed: $v") == expectedOutput)

check("AnD", "Parsed: And")
check("oR", "Parsed: Or")
check("IllegalBooleanOperation", "Cannot parse IllegalBooleanOperation as an AndOr")

It is also possible to pattern match over Parsed, however, you may experience spurious warnings related to SI-4440. In order to prevent these warnings import fastparse.core.Result in versions 0.2.x and import fastparse.core.Parsed in higher versions than 0.2.x.

An overview of Parsed:

/**
 * @param value The result of this parse
 * @param index The index where the parse completed; may be less than
 *              the length of input
 */
case class Success[+T, Elem, Repr](value: T, index: Int) extends Parsed[T, Elem, Repr]
/**
 * Simple information about a parse failure. Also contains the original parse
 * information necessary to construct the traced failure. That contains more
 * information but is more costly to compute and is thus computed lazily on
 * demand.
 *
 * @param index The index in the parse where this parse failed
 * @param lastParser The deepest parser in the parse which failed
 * @param extra Extra supplementary information (including trace information).
 *              For details see [[Parsed.Failure.Extra]]
 */
case class Failure[Elem, Repr](lastParser: Parser[_, Elem, Repr],
                               index: Int,
                               extra: Failure.Extra[Elem, Repr]) extends Parsed[Nothing, Elem, Repr]{

  def msg = Failure.formatStackTrace(
    Nil, extra.input, index, Failure.formatParser(lastParser, extra.input, index)
  )

  override def toString = s"Failure($msg)"
}

Note how Failure only contains the parser which failed and a single index where the parse failed. Further debugging information is available via the Failure.Extra class. Especially the TracedFailure that is lazily-computed via Extra.traced, provides valuable information: It performs a whole new parse on the input data with additional instrumentation, and provides additional insight into why the parse failed:

/**
 * A failure containing detailed information about a parse failure. This is more
 * expensive to compute than a simple error message and is thus not generated
 * by default.
 *
 * @param fullStack The entire stack trace where the parse failed, containing every
 *                  parser in the stack and the index where the parser was used, excluding
 *                  the final parser and index where the parse failed. Only set if
 *                  `parse` is called with `trace = true`, otherwise empty
 * @param traceParsers A list of parsers that could have succeeded at the location
 *                     that this
 */
case class TracedFailure[Elem, Repr](input: ParserInput[Elem, Repr],
                                     index: Int,
                                     fullStack: Vector[Frame],
                                     traceParsers: Set[Parser[_, Elem, Repr]]) {

  private[this] lazy val expected0 = new Precedence {
    def opPred = if (traceParsers.size == 1) traceParsers.head.opPred else Precedence.|
    override def toString = traceParsers.map(opWrap).mkString(" | ")
  }

  /**
   * A short string describing the parsers which were expected at the point
   * of failure.
   */
  def expected = expected0.toString

  /**
   * A slimmed down version of [[fullStack]], this only includes named
   * [[parsers.Combinators.Rule]] objects as well as the final Parser (whether named or not)
   * and index where the parse failed for easier reading.
   */
  lazy val stack = Failure.filterFullStack(fullStack)

  /**
   * A one-line snippet that tells you what the state of the parser was
   * when it failed. This message is completely derived from other values
   * available on this object, so feel free to use the data yourself if
   * the default error message isn't to your liking.
   */
  lazy val trace = {
    Failure.formatStackTrace(stack, input, index, Failure.formatParser(expected0, input, index))
  }
}

Computing the Extra.traced data is not done by default for performance reasons: the additional run takes about 3x longer than the initial run due to the instrumentation, for a total of 4x slowdown. If you want the information for debugging, though, it will be there.

Byte Parsers


While FastParse was originally designed to parse Strings, it also provides a fastparse.byte package that allows parsing of Bytes (or Iterator[Bytes]s) into structured data.

To begin using FastParse to parse bytes, add

"com.lihaoyi" %% "fastparse-byte" % "0.4.3"

To your SBT configuration. To use with Scala.js, you'll need

"com.lihaoyi" %%% "fastparse-byte" % "0.4.3"

Here is a small example parser that parsers an imaginary struct-like format: a fixed header, two fixed-length fields (big-endian), one 0-terminated C-style string, and one length-prefixed binary blob up to 65535 bytes long (the max size of the UInt16 length field):

import fastparse.byte.all._

case class Struct(f: Float, i: Int, s: String, b: Bytes)

val header = P( BS(0xDE, 0xAD, 0xBE, 0xEF) ~ BE.Float32 ~ BE.Int32 )

val cString = P( BytesWhile(_ != 0).! ~ BS(0x0) ).map(x => new String(x.toArray))

val varLengthBlob = BE.UInt16.flatMap{AnyBytes(_).!}

val structThing = P( header ~ cString ~ varLengthBlob ~ End).map(Struct.tupled)

As you can see it has the same underlying structure as previous Basic string-parsers: it uses ~ to Sequence parsers, ! to Capture sections of bytes, Map, to turn them into useful structures, etc..

Here is how we use it:

val bytes = Bytes(
  0xDE, 0xAD, 0xBE, 0xEF,
  64, 73, 15, -37,
  0, 0, 122, 105,
  104, 101, 108, 108, 111, 0,
  0, 5, 119, 111, 114, 108, 100
)

val Parsed.Success(result, _) = structThing.parse(bytes)


assert(
  math.abs(result.f - Math.PI) < 0.0001,
  result.i == 31337,
  result.s == "hello",
  new String(result.b.toArray) == "world"
)

About the same as what you would expect, if you had used FastParse to parse textual Strings.

While fastparse.byte is similar to the string-parsing API, there are some new primitives that are specific to byte parsers, and some existing primitives that are named slightly differently.

Bytes

The primary data-structure fastparse.byte works with is Bytes. This gets imported when you import fastparse.byte.all._, and is an alias for the excellent scodec.bits.ByteVector class from scodec-bits by Michael Pilquist. Bytes is basically an immutable version of Array[Byte], with structural equality, a useful toString, and convenient operations for all the things you typically want to do with blobs of bytes. fastparse.byte's .parse method takes Bytes as input, and its Capture operator captures Bytes as output.

Bytes provides convenient methods for construction:

import fastparse.byte.all._

// Constructing a short ByteVector from bytes
val a = Bytes(0x01, 0xff)
assert(a.toString == "ByteVector(2 bytes, 0x01ff)")

// Constructing a ByteVector from an ASCII string
val b  = Bytes("hello":_*)
assert(b.toString == "ByteVector(5 bytes, 0x68656c6c6f)")

// Constructing a ByteVector copying from a Array[Byte]
val byteArray = Array[Byte](1, 2, 3, 4)
val c = Bytes(byteArray)
assert(c.toString == "ByteVector(4 bytes, 0x01020304)")

// Unsafe construction from an Array[Byte], without copying; assumes
// You do not mutate the underlying array, otherwise things break.
val d = Bytes.view(byteArray)
assert(d.toString == "ByteVector(4 bytes, 0x01020304)")

// Hex Strings
val e = hex"cafebabedeadbeef"
assert(e.toString == "ByteVector(8 bytes, 0xcafebabedeadbeef)")

Bytes provides many of the common operations that you would want to perform on binary-blobs build in. These more or less match what you would expect from the standard Scala collections:

import fastparse.byte.all._

val a = Bytes(0xff, 0xff)
val b = Bytes(0x01, 0x01)

// Simple operations
assert(
  a.length == 2,
  a(0) == 0xff.toByte,
  b(0) == 0x01.toByte
)

// Concatenating byte vectors
val c = a ++ b
assert(c.toString == "ByteVector(4 bytes, 0xffff0101)")

val d = b ++ a
assert(d.toString == "ByteVector(4 bytes, 0x0101ffff)")

// Converting to Arrays
val arr = d.toArray
assert(
  arr(0) == 1, arr(1) == 1,
  arr(2) == -1, arr(3) == -1
)


// Convenient conversions
assert(
  c.toHex == "ffff0101",
  c.toBase64 == "//8BAQ==",
  Bytes.fromHex("ffff0101").get == c,
  Bytes.fromHex("""
    ff ff
    01 01
  """).get == c,
  Bytes.fromBase64("//8BAQ==").get == c
)

These are just the highlights of the most common operations on a Bytes. There are many more methods on Bytes instances as well as on the Bytes companion object, but you can explore those yourselves in your editor's autocomplete menu.

In general, it is reasonable to use Bytes throughout your codebase where-ever you need to represent binary data. In the few cases where performance is of utmost importance, or you really need read-write mutability, an Array[Byte] will be necessary, but for the majority of un-changing blobs-of-bytes a Bytes will do just fine.

Byte Helpers

Apart from providing an equivalent API for byte parsing, FastParse's byte-parsers also provide some helper functions for conveniently dealing with binary blobs.

Hex Bytes

FastParse includes Scodec's hex-interpolator:

import fastparse.byte.all._

// Constructing `Bytes` via their hex values
assert(
  hex"01 ff" == Bytes(0x01, 0xff),

  hex"01 ff" == Bytes.fromHex("01 ff").get,

  hex"""
    de ad be ef
    ca fe ba be
  """ == Bytes(
    0xde, 0xad, 0xbe, 0xef,
    0xca, 0xfe, 0xba, 0xbe
  ),

  hex"""
  de ad be ef
  ca fe ba be
  """ == Bytes.fromHex("""
    de ad be ef
    ca fe ba be
  """).get
)


// You can interpolate Bytes into hex literals

val beef = hex"de ad be ef"
val cafeBytes = Bytes(0xca, 0xfe)
assert(
  hex"$beef $cafeBytes ba be" == Bytes(
    0xde, 0xad, 0xbe, 0xef,
    0xca, 0xfe, 0xba, 0xbe
  )
)

This is a macro that converts the hex literal in a Bytes value at compile time. This should be both fast and safe: if you have a type in your hex literal it generates and error at compile time.

prettyBytes

This function converts a byte array into a nicely-readable hex-grid.

import fastparse.byte.all._

val blob = Bytes(
  "The quick brown fox jumps over the lazy dog. She sells " +
  "sea shells on the sea shore but the shells she sells she " +
  "sells no more. Peter piper picked a pack of pickled peppers.":_*
)
assert( prettyBytes(blob) ==
  """       0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15
    |
    |0      54 68 65 20 71 75 69 63 6b 20 62 72 6f 77 6e 20
    |16     66 6f 78 20 6a 75 6d 70 73 20 6f 76 65 72 20 74
    |32     68 65 20 6c 61 7a 79 20 64 6f 67 2e 20 53 68 65
    |48     20 73 65 6c 6c 73 20 73 65 61 20 73 68 65 6c 6c
    |64     73 20 6f 6e 20 74 68 65 20 73 65 61 20 73 68 6f
    |80     72 65 20 62 75 74 20 74 68 65 20 73 68 65 6c 6c
    |96     73 20 73 68 65 20 73 65 6c 6c 73 20 73 68 65 20
    |112    73 65 6c 6c 73 20 6e 6f 20 6d 6f 72 65 2e 20 50
    |128    65 74 65 72 20 70 69 70 65 72 20 70 69 63 6b 65
    |       ...""".stripMargin
)

This makes it easier for you to look for byte-ranges, count offsets, and that sort of thing. By default, it only shows the first 8 rows of the byte array. You can specify markers if you want to view portions of the byte array around other indices, and prettyBytes will add markers so you can see exactly the spots you specified:

import fastparse.byte.all._

val blob = Bytes(
  "The quick brown fox jumps over the lazy dog. She sells " +
  "sea shells on the sea shore but the shells she sells she " +
  "sells no more. Peter piper picked a pack of pickled peppers.":_*
)

assert( prettyBytes(blob, markers = Seq(75, 100)) ==
  """       0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15
    |
    |0      54 68 65 20 71 75 69 63 6b 20 62 72 6f 77 6e 20
    |16     66 6f 78 20 6a 75 6d 70 73 20 6f 76 65 72 20 74
    |32     68 65 20 6c 61 7a 79 20 64 6f 67 2e 20 53 68 65
    |48     20 73 65 6c 6c 73 20 73 65 61 20 73 68 65 6c 6c
    |64     73 20 6f 6e 20 74 68 65 20 73 65 61 20 73 68 6f
    |                                        ^
    |80     72 65 20 62 75 74 20 74 68 65 20 73 68 65 6c 6c
    |96     73 20 73 68 65 20 73 65 6c 6c 73 20 73 68 65 20
    |                   ^
    |112    73 65 6c 6c 73 20 6e 6f 20 6d 6f 72 65 2e 20 50
    |128    65 74 65 72 20 70 69 70 65 72 20 70 69 63 6b 65
    |144    64 20 61 20 70 61 63 6b 20 6f 66 20 70 69 63 6b
    |160    6c 65 64 20 70 65 70 70 65 72 73 2e""".stripMargin
)

Byte-Specific Parsers

Many byte formats heavily use the concept of a fixed width "word" or "double"-word, always a multiple-of-two bytes, and usually parsed into the corresponding twos-complement integer or floating point number. fastparse.byte provides built in support for these.

BS

import fastparse.byte.all._

val parser = P( BS(0xDE, 0xAD, 0xBE, 0xEF) )

val Parsed.Success((), 4) = parser.parse(Bytes(0xDE, 0xAD, 0xBE, 0xEF))
val Parsed.Failure(_, 0, _) = parser.parse(Bytes(0xCA, 0xFE, 0xBA, 0xBE))

This is the equivalent to the Basic literal string parser in when parsing strings. If the input matches the given sequence of bytes exactly, it the parse succeeds; if the input does not match or there's not enough input, the parse fails.

You can also pass in a ByteString to BS instead of individual byte values:

import fastparse.byte.all._

val parser1 = P( BS(hex"deadbeef") )

val Parsed.Success((), 4) = parser1.parse(Bytes(0xDE, 0xAD, 0xBE, 0xEF))
val Parsed.Failure(_, 0, _) = parser1.parse(Bytes(0xCA, 0xFE, 0xBA, 0xBE))

val bytes = Bytes(0xDE, 0xfAD, 0xBE, 0xEF)
val parser2 = P( BS(bytes) )

val Parsed.Success((), 4) = parser1.parse(Bytes(0xDE, 0xAD, 0xBE, 0xEF))
val Parsed.Failure(_, 0, _) = parser1.parse(Bytes(0xCA, 0xFE, 0xBA, 0xBE))

WordN

fastparse.byte provides convenient parsers for the common case of parsing 1, 2, 4, and 8-byte "words".

import fastparse.byte.all._

def allZeroesByteArray = Bytes(0, 0, 0, 0, 0, 0, 0, 0)

val p1 = P( AnyByte )
val Parsed.Success((), index1) = p1.parse(allZeroesByteArray)
assert(index1 == 1)


val p2 = P( Word16 )
val Parsed.Success((), index2) = p2.parse(allZeroesByteArray)
assert(index2 == 2)


val p3 = P( Word32 )
val Parsed.Success((), index3) = p3.parse(allZeroesByteArray)
assert(index3 == 4)


val p4 = P( Word64 )
val Parsed.Success((), index4) = p4.parse(allZeroesByteArray)
assert(index4 == 8)

These parsers do not capture any result, but merely consume the bytes and move the index forward. If you want to capture the bytes as an Array[Byte], use the .! to Capture them. If you want to capture them as various sorts of integers or numbers, use one of the IntN parsers listed below

IntN

FastParse provides parsers to capture single bytes and return them, as signed Bytes.

import fastparse.byte.all._

val p = P( Int8 )


val Parsed.Success(result1, _) = p.parse(Bytes(123)) // 7b
assert(result1 == 123)


val Parsed.Success(result2, _) = p.parse(Bytes(-123)) // 85
assert(result2 == -123)


val Parsed.Success(result3, _) = p.parse(Bytes(-1)) // ff
assert(result3 == -1)

An unsigned versions of the parser is available, that lifts the result into a Short since the higher-values of an unsigned byte (e.g. 128) cannot fit into a signed Byte, which maxes out at positive 127.

import fastparse.byte.all._

val p = P( UInt8 )


val Parsed.Success(result1, _) = p.parse(Bytes(123)) // 7b
assert(result1 == 123)


val Parsed.Success(result2, _) = p.parse(Bytes(-123)) // 85
assert(result2 == 133)


val Parsed.Success(result3, _) = p.parse(Bytes(-1)) // ff
assert(result3 == 255)

Apart from capturing Bytes, FastParse also provides parsers which capture Shorts, Ints and Longs. These come in both big-endian (BE below) and little-endian (LE) versions:

import fastparse.byte.all._

val p1 = P( BE.Int16 )
val Parsed.Success(result1, _) = p1.parse(Bytes(1, 0)) // 01 00
assert(result1 == 256)


val p2 = P( LE.Int16 )
val Parsed.Success(result2, _) = p2.parse(Bytes(1, 0)) // 01 00
assert(result2 == 1)


val p3 = P( BE.Int32 )
val bytes3 = Bytes(-128, 0, 0, 0) // ff 00 00 00
val Parsed.Success(result3, _) = p3.parse(bytes3)
assert(result3 == -2147483648)


val p4 = P( LE.Int32 )
val bytes4 = Bytes(-128, 0, 0, 0) // ff 00 00 00
val Parsed.Success(result4, _) = p4.parse(bytes4)
assert(result4 == 128)

As well as signed and un-signed versions

import fastparse.byte.all._

val p1 = P( BE.Int64 )
val bytes1 = Bytes(1, 0, 0, 0, 0, 0, 0, 0) // 01 00 00 00 00 00 00 00
val Parsed.Success(result2, _) = p1.parse(bytes1)
assert(result2 == 72057594037927936L)


val p2 = P( BE.Int32 )
val bytes2 = Bytes(-128, -128, -128, -128) // 80 80 80 80
val Parsed.Success(result3, _) = p2.parse(bytes2)
assert(result3 == -2139062144)


val p3 = P( BE.UInt32 )
val bytes3 = Bytes(-128, -128, -128, -128) // 80 80 80 80
val Parsed.Success(result4, _) = p3.parse(bytes3)
assert(result4 == 2155905152L)

The unsigned versions of all these parsers max out at Int32; this is because there is no primitive type in Scala that can hold the higher-values of an unsigned 64-bit Long.

Note the difference between LE and BE versions of the IntN parsers; there are two standards of ordering bytes, so you have to pick which one you want to use. You can also import LE._ or import BE._ if the endian-ness is the same throughout your parser, and then use Int16/Int32/Int64 directly. Typically, big-endian formats are common in networking applications (e.g. UDP packets) while little-endian formats are common in operating systems and micro-processors.

FloatN

Apart from parsing integers using IntN, fastparse.byte also contains parsers for common floating point numbers. These come in both big-endian/little-endian and 32/64-bit versions.

import fastparse.byte.all._

val p1 = P( BE.Float32 )
val bytes1 = Bytes(64, 73, 15, -37) // 40 49 0f db
val Parsed.Success(result1, _) = p1.parse(bytes1)
assert(math.abs(result1 - 3.1415927) < 0.00001)


val p2 = P( LE.Float32 )
val bytes2 = Bytes(-37, 15, 73, 64) // db 0f 49 40
val Parsed.Success(result2, _) = p2.parse(bytes2)
assert(math.abs(result2 - 3.1415927) < 0.00001)


val p3 = P( BE.Float64 )
val bytes3 = Bytes(64, 9, 33, -5, 84, 68, 45, 24) // 40 09 21 fb 54 44 2d 18
val Parsed.Success(result3, _) = p3.parse(bytes3)
assert(math.abs(result3 - 3.1415927) < 0.00001)

Equivalent Byte Parsers

Apart from the byte-specific parsers mentioned above, Fastparse's byte-parsing API has many of the

For example, here is a small example using BS primitives and AnyByte

import fastparse.byte.all._

val ab = P( BS(1) ~ AnyByte.! ~ BS(1) )

val Parsed.Success(res, 3) = ab.parse(hex"01 42 01")
assert(res == Bytes(0x42))

val failure @ Parsed.Failure(parser, 2, _) = ab.parse(hex"01 42 43 01")

assert(
  parser == (BS(1): P0),
  failure.msg == """ "01":2 ..."43 01" """.trim
)

Apart from the changes listed above, FastParse's byte-parsing API works about same as it's string-parsing API. Sequence, Repeat, Optional, Either, Capture, Map, FlatMap, Filter and many other operators all work the same when parsing bytes as they do when parsing strings.

The process of Debugging Parsers, Using Cuts or Using Log to figure out what is going on, is all identical between FastParse's byte-parsing and string-parsing APIs.

Here are a bunch of examples demonstrating the use of most of FastParse's byte-parsing features:

Simple Bytes

import fastparse.byte.all._

val parseA = P( BS(1, 2, 3) )

val Parsed.Success((), 3) = parseA.parse(Bytes(1, 2, 3))

val Parsed.Success((), 3) = parseA.parse(hex"01 02 03")

val Parsed.Failure(lastParser, index, extra) = parseA.parse(Bytes(2))
assert(
  lastParser == (BS(1, 2, 3): P0),
  index == 0,
  extra.traced.trace == """parseA:0 / "01 02 03":0 ..."02""""
)

This is the simplest byte-parser available: a single BS containing the bytes you want to match.

Sequence Bytes

import fastparse.byte.all._

val ab = P( BS(1) ~ BS(2) )

val Parsed.Success(_, 2) = ab.parse(Bytes(1, 2))

val Parsed.Failure(parser, 1, _) = ab.parse(hex"01 01") // or BS(1, 1)
assert(parser == (BS(2): P0))

~ matches the left-side parser, and then the right-side parser. You can chain arbitrarily many of these

Repeat Bytes

import fastparse.byte.all._

val ab = P( BS(1).rep ~ BS(2) )
val Parsed.Success(_, 8) = ab.parse(hex"01 01 01 01 01 01 01 02")
val Parsed.Success(_, 4) = ab.parse(hex"01 01 01 02")

val abc = P( BS(1).rep(sep = BS(2)) ~ BS(3) )
val Parsed.Success(_, 8) = abc.parse(hex"01 02 01 02 01 02 01 03")
val Parsed.Failure(parser, 3, _) = abc.parse(hex"01 02 01 01 02 01 03")

val ab4 = P( BS(1).rep(min = 2, max = 4, sep = BS(2)) )
val Parsed.Success(_, 7) = ab4.parse(hex"01 02 01 02 01 02 01 02 01 02 01 02 01 02 01 02")

val ab4c = P( BS(1).rep(min = 2, max = 4, sep = BS(2)) ~ BS(3) )
val Parsed.Failure(_, 1, _) = ab4c.parse(hex"01 03")
val Parsed.Success(_, 4) = ab4c.parse(hex"01 02 01 03")
val Parsed.Success(_, 8) = ab4c.parse(hex"01 02 01 02 01 02 01 03")
val Parsed.Failure(_, 7, _) = ab4c.parse(hex"01 02 01 02 01 02 01 02 01 03")

.rep lets you repeat a parser zero or more times. By default, this also allows the parser to not match at all (zero times), but you can pass in min = and max = to bound the number of repetitions within a certain range, exactly = to match only a fixed number of bytes (useful for fixed-with data-fields) or a separator via sep = if you are parsing bytes delimited by zeroes or some other separator.

Optional Bytes

import fastparse.byte.all._

val option = P( BS(3).? ~ BS(1).rep().! ~ End )

val Parsed.Success(res1, 3) = option.parse(hex"03 01 01")
assert(res1 == Bytes(1, 1))

val Parsed.Success(res2, 2) = option.parse(hex"01 01")
assert(res2 == Bytes(1, 1))

val Parsed.Success(res3, 1) = option.parse(hex"01")
assert(res3 == Bytes(1))

.? lets the parser continue if it failed, trying the next thing if it doesn't parse. Above, you can see it can either parse the `01` with or without a preceding `03`.

Either Bytes

import fastparse.byte.all._

val either = P( (BS(2, 2) | BS(3, 3, 3) | BS(4)).rep() ~ End )

// any combination of 04 or 02 02 or 03 03 03 succeeds
val Parsed.Success(_, 6) = either.parse(hex"02 02 04 03 03 03")
val Parsed.Success(_, 6) = either.parse(hex"02 02 04 02 02 04")

// if there's a 01, which is none of the above options, fails
val Parsed.Failure(parser, 3, _) = either.parse(hex"02 02 04 01")

| tries the parser on the left, and if that fails, tries the parser on the right; if either of them passes it passes, but if both sides fail then whole either parser fails.

End/Start Bytes

import fastparse.byte.all._

val noEnd = P( BS(1).rep ~ BS(2) )
val withEnd = P( BS(1).rep ~ BS(2) ~ End )

val Parsed.Success(_, 4) = noEnd.parse(hex"01 01 01 02 01")
val Parsed.Failure(End, 4, _) = withEnd.parse(hex"01 01 01 02 01")

End only succeeds if it is at the end of the input. By default, byte parsers can succeed parsing prefixes of the input. If you want a parser to fail if it doesn't reach the end of an input, use End.

Start is the opposite, only succeeding if at the start of the input:

import fastparse.byte.all._

val ab = P( ((BS(1) | Start) ~ BS(2)).rep ~ End ).!

val Parsed.Success(res1, 4) = ab.parse(hex"01 02 01 02")
assert(res1 == Bytes(1, 2, 1, 2))

val Parsed.Success(res2, 5) = ab.parse(hex"02 01 02 01 02")
assert(res2 == Bytes(2, 1, 2, 1, 2))

val Parsed.Failure(parser, 2, _) = ab.parse(hex"01 02 02")

Pass/Fail Bytes

import fastparse.byte.all._

val Parsed.Success((), 0) = Pass.parse(hex"04 08 15 16 23 42")
val Parsed.Failure(Fail, 0, _) = Fail.parse(hex"04 08 15 16 23 42")

Pass/Fail pass or fail all the time. This is useful if you need a "stub" Parser that always behaves in some fixed way.

Index Bytes

import fastparse.byte.all._

val finder = P( BS(1, 1, 1).rep ~ Index ~ BS(2, 2, 2) ~ BS(3, 3, 3).rep )

val Parsed.Success(9, _) = finder.parse(hex" 01 01 01  01 01 01  01 01 01  02 02 02  03 03 03")

Index consumes no input, but returns the current index that we've parsed until. Useful for capturing information for debugging later.

Capture Bytes

import fastparse.byte.all._

val capture1 = P( BS(1).rep.! ~ BS(2) ~ End )

val Parsed.Success(res1, 4) = capture1.parse(hex"01 01 01 02")
assert(res1 == Bytes(1, 1, 1))

val capture2 = P( BS(1).rep.! ~ BS(2).! ~ End )

val Parsed.Success(res2, 4) = capture2.parse(hex"01 01 01 02")
assert(res2 == (Bytes(1, 1, 1), Bytes(2)))

val capture3 = P( BS(1).rep.! ~ BS(2).! ~ BS(3).! ~ End )

val Parsed.Success(res3, 5) = capture3.parse(hex"01 01 01 02 03")
assert(res3 == (Bytes(1, 1, 1), Bytes(2), Bytes(3)))

val captureRep = P( BS(1).!.rep ~ BS(2) ~ End )

val Parsed.Success(res4, 4) = captureRep.parse(hex"01 01 01 02")
assert(res4 == Seq(Bytes(1), Bytes(1), Bytes(1)))

val captureOpt = P( BS(1).rep ~ BS(2).!.? ~ End )

val Parsed.Success(res5, 4) = captureOpt.parse(hex"01 01 01 02")
assert(res5 == Some(Bytes(2)))

.! captures the range of the modified bytes as a Bytes object, which you can then use for whatever you want. A .! inside an Optional Bytes parser becomes an Option, and a .! inside a Repeat Bytes parser becomes a Seq.

If you want to capture one of the common formats of 8-, 16-, 32- or 64-bit integers of floating point numbers, you should probably use some of the Byte-Specific Parsers instead. You could do it yourself by assembling the number from the raw Bytes, but use the built-ins is less tedious and will have better performance.

Any Bytes

import fastparse.byte.all._

val ab = P( BS(1) ~ AnyByte.! ~ BS(1) )

val Parsed.Success(res, 3) = ab.parse(hex"01 42 01")
assert(res == Bytes(0x42))

val failure @ Parsed.Failure(parser, 2, _) = ab.parse(hex"01 42 43 01")

assert(
  parser == (BS(1): P0),
  failure.msg == """ "01":2 ..."43 01" """.trim
)

Consumes any single byte. You can also use AnyBytes(count: Int) to consume a fixed number of bytes. Does not capture anything unless you explicitly Capture Bytes, and

Positive Lookahead Bytes

import fastparse.byte.all._

val keyword = P( (BS(1, 2, 3) ~ &(BS(4))).!.rep )

val Parsed.Success(res, _) = keyword.parse(hex"01 02 03 04")
assert(res == Seq(Bytes(1, 2, 3))
)
val Parsed.Success(Seq(), __) = keyword.parse(hex"01 02 03 05")

&(...) tries the inner parser to ensure it passes, but if so it backtracks to where it started before the subsequent parser runs. If the inner parser fails, it fails.

Negative Lookahead Bytes

import fastparse.byte.all._

val keyword = P( BS(1, 2, 3) ~ !BS(0) ~ AnyByte ~ BS(5, 6, 7) ).!

val Parsed.Success(res, _) = keyword.parse(hex"01 02 03 42 05 06 07")
assert(res == Bytes(1, 2, 3, 0x42, 5, 6, 7))

val Parsed.Failure(parser, 4, _) = keyword.parse(hex"01 02 03 00 05 06 07")
assert(parser == !BS(0))

!(...) tries the inner parser to ensure it fails, but if so it backtracks to where it started before the subsequent parser runs. If the inner parser succeeds, this fails.

Map Bytes

import fastparse.byte.all._

val binary = P( (BS(0) | BS(1)).rep.! )

val Parsed.Success(res, _) = binary.parse(hex"01 01 00 00")
assert(res == Bytes(1, 1, 0, 0))

val binaryNum = P( binary.map(_.toArray.sum) )
val Parsed.Success(2, _) = binaryNum.parse(Bytes(0x01, 0x01, 0x00, 0x00))

Lets you transform the current result of this parser (whether a Byte/Short/Int/Long/Float/Double from one of the Byte-Specific Parsers or a Bytes from a Capture Bytes) into something else: put it in a case class, convert it to a Array[Byte], whatever.

Filter Bytes

import fastparse.byte.all._

val nullTerminated = P( (Int8.filter(_ != 0).rep().! ~ BS(0)).rep() )

val Parsed.Success(res, _) = nullTerminated .parse(
  hex"de ad be ef 00 13 37 00"
)
assert(res == Seq(hex"de ad be ef", hex"13 37"))

Lets you check a parser with a predicate; if the predicate fails, that parser fails.

FlatMap Bytes

import fastparse.byte.all._

val lengthPrefixed = P( Int8.flatMap(AnyBytes(_).!).rep() )

val Parsed.Success(res, _) = lengthPrefixed.parse(
  Bytes(0x04, 0xde, 0xad, 0xbe, 0xef, 0x02, 0x13, 0x37)
)
assert(res == Seq(Bytes(0xde, 0xad, 0xbe, 0xef), Bytes(0x13, 0x37)))

Lets you choose the subsequent parser based on the result of the previous parser; useful for length-delimited fields, where a preceding fixed-width length field determines how many subsequent bytes make up the actual contents of the field.

BytesWhile

import fastparse.byte.all._

val nullTerminated = P( (BytesWhile(_ != 0).! ~ BS(0)).rep() )

val Parsed.Success(res, _) = nullTerminated.parse(
  hex"de ad be ef 00 13 37 00"
)
assert(res == Seq(hex"de ad be ef", hex"13 37"))

A fast way of consuming lots of bytes as long as the given predicate returns true. Useful for things like 0-delimited fields, where you want to consume the input until the next 0 byte.

The predicate is used to build a lookup table, which is then used to speed up parsing. You can use BytesWhile.raw if you want to avoid pre-computing the lookup table, in which case the predicate is re-run repeatedly during parsing.

BytesWhile

import fastparse.byte.all._

val nullTerminated = P( (BytesWhileIn(hex"de ad be ef".toSeq).! ~ BS(0)).rep() )

val Parsed.Success(res, _) = nullTerminated.parse(
  hex"de ad be ef 00 13 37 00"
)
assert(res == Seq(hex"de ad be ef"))

A fast way of consuming lots of bytes as long as the given predicate returns true. Useful for things like 0-delimited fields, where you want to consume the input until the next 0 byte.

The predicate is used to build a lookup table, which is then used to speed up parsing. You can use BytesWhile.raw if you want to avoid pre-computing the lookup table, in which case the predicate is re-run repeatedly during parsing.

BytePred

import fastparse.byte.all._

val nullTerminated = P( (BytePred(_ != 0).rep().! ~ BS(0)).rep() )

val Parsed.Success(res, _) = nullTerminated.parse(
  Bytes(0xde, 0xad, 0xbe, 0xef, 0x00, 0x13, 0x37, 0x00)
)
assert(res == Seq(Bytes(0xde, 0xad, 0xbe, 0xef), Bytes(0x13, 0x37)))

Consumes a single-byte, satisfying any arbitrary predicate.

The predicate is used to build a lookup table, which is then used to speed up parsing. You can use BytePred.raw if you want to avoid pre-computing the lookup table, in which case the predicate is re-run repeatedly during parsing.

Example Byte Parsers

UDP Packets

This is an example UDP Datagram packet-parser implemented using FastParse's byte-parsing api:

import fastparse.byte.all._

case class UdpPacket(sourcePort: Int,
                     destPort: Int,
                     checkSum: Int,
                     data: Bytes)

// BE.UInt16 stands for big-endian unsigned-16-bit-integer parsers
val udpHeader = P( BE.UInt16 ~ BE.UInt16 ~ BE.UInt16 ~ BE.UInt16 )

val udpParser = P(
  for{
    (sourcePort, destPort, length, checkSum) <- udpHeader
    data <- AnyBytes(length - 8).!
  } yield UdpPacket(sourcePort, destPort, checkSum, data)
)

Here's how to use it:

val bytes = hex"""
  04 89 00 35 00 2C AB B4 00 01 01 00 00 01 00 00
  00 00 00 00 04 70 6F 70 64 02 69 78 06 6E 65 74
  63 6F 6D 03 63 6F 6D 00 00 01 00 01
"""

val Parsed.Success(packet, _) = udpParser.parse(bytes)
assert(
  packet.sourcePort == 1161,
  packet.destPort == 53,
  packet.checkSum == 43956,
  packet.data.length == 36,
  packet.data == hex"""
    00 01 01 00 00 01 00 00 00 00 00 00 04 70 6F 70
    64 02 69 78 06 6E 65 74 63 6F 6D 03 63 6F 6D 00
    00 01 00 01
  """
)

BmpParser

BmpParser is a good example of a slightly more complex byte parser: a parser for Bitmap image files. It's not small, but structure is pretty simple. It does not support the full breadth of the Bitmap format, but it supports enough to parse many common images.

Like all other FastParse parsers, FastParse's Byte parsers all run in the browser via Scala.js, and so does BmpParser! Here we have a live demo: upload a bitmap file, and will parse it and print out some simple metadata (width, height, etc.):

If you haven't any bmp around, you can download classic lena.

BmpParser Walkthrough

Here, we'll walk through the implementation of BmpParser, which should give you a taste of how FastParse's Byte Parsers work in general.

import fastparse.ByteUtils.LE._

First of all we import package for Little-Endian support, because BMP format use it.

val bmp = P(
  for{
    fileHeaderData <- fileHeader
    headerData <- header
    pixels <- bmpRow(headerData.infoPart.width, headerData.infoPart.bitsPerPixel)
                .rep(exactly=headerData.infoPart.height)
  }yield Bmp(fileHeaderData, headerData, pixels.reverse)
)

Bmp file consists of two headers (file header and info header) and pixels in rows with padding, the difficulties are that there are several versions of headers and the parser should distinct them and process them correctly, and that the size and padding of rows in bmp file depends on information from header.

val fileHeader = {
  val headerType = UInt16
  val size = Int32
  val offset = Int32
  P( headerType ~ size ~ Word16 ~ Word16 ~ offset ).map(FileHeader.tupled)
}

val infoHeaderPart = {
  val width = Int32
  val height = Int32
  val colorPlanes = UInt16
  val bitsPerPixel = UInt16
  val compression = Int32
  val imageSize = Int32
  val horzRes = Int32
  val vertRes = Int32
  val colorUsed = Int32
  val colorsImportant = Int32
  P(
    width ~ height ~
      colorPlanes ~ bitsPerPixel ~
      compression ~ imageSize ~
      horzRes ~ vertRes ~
      colorUsed ~ colorsImportant
  ).map(
    s => BitmapInfoHeader(BitmapInfoHeaderPart.tupled(s))
  )
}

val v2HeaderPart = {
  val RgbPart = P( Word32 )
  P( infoHeaderPart ~ RgbPart.rep(exactly=3) )
}

val v3HeaderPart = {
  val AlphaPart = P( Word32 )
  P( v2HeaderPart ~ AlphaPart )
}

val v4HeaderPart = P( v3HeaderPart ~ AnyBytes(52) )

val v5HeaderPart = P( v4HeaderPart ~ AnyBytes(16) )


val infoHeader = P( BS(40, 0, 0, 0) /* 40 bytes */ ~/ infoHeaderPart )
val v2Header = P( BS(52, 0, 0, 0) ~/ v2HeaderPart )
val v3Header = P( BS(56, 0, 0, 0) ~/ v3HeaderPart )
val v4Header = P( BS(108, 0, 0, 0) ~/ v4HeaderPart )
val v5Header = P( BS(124, 0, 0, 0) ~/ v5HeaderPart )

val header = P( infoHeader | v2Header | v3Header | v4Header | v5Header)

The first problem is reflected in similar parsers describing 5 versions of info header (v2HeaderPart, v2Header, v3HeaderPart, v3Header...).

def bmpRow(width: Int, bitsPerPixel: Int): P[Seq[Pixel]] = {
  val bytesPerPixel = bitsPerPixel / 8
  val padding = (width * bytesPerPixel) % 4
  P( AnyBytes(bytesPerPixel).!.~/.rep(exactly=width) ~/ AnyBytes(padding) ).map(
    pixels => pixels.map(Pixel)
  )
}

The second problem in the bmpRow function that computes the parameters of row and creates parser on a fly.

Note also few tricks for parsing binary data.

MidiParse

MidiParse is an example byte-parser written using FastParse, for parsing MIDI music files. Like all FastParse parsers, it compiles to Javascript via Scala.js and can run in the browser: below is a small widget that uses MidiParse to parse, analyze and play any Midi file you upload use Keith Horwood's AudioSynth Javascript library.

If you don't have any midi files lying around, perhaps you could try one of these:

If you wish, you can also check out this project and run sbt "fastparseJVM/test:run go.mid" from the command line to play the same MIDI file using the same parser, but running on Scala-JVM and using the JVM's MIDI player rather than the browser's. In general, the JVM player is much more robust than the Web player: the Web player doesn't know how to handle changes in pitch bend, polyphonic pressure, channel pressure or controller, and plays all notes as a single piano-like instrument

MidiParse comprises about 50 lines of case classes that define the structure of the midi file, and another 140 lines of FastParse code for parsing the MIDI from binary input:

MidiParse illustrates many of the quirks of writing a parser for a real-world binary format:

Other than these quirks, the majority of the parser is relatively straightforward FastParse code.

ClassParser

The other example byte-parser that FastParse provides is ClassParser. It's quite a complex parser that process .class files with java bytecode, and parses them into a simple structured format. You can see the code here:

ClassParse is able to retrieve almost full information about given class including all methods, fields, subclasses, code and pack it into the convenient AST. On the similarity with other big parsers it has good set of unit-tests with tests which compile real projects from github (joda-time, junit4, jenkins and etc.) and check each built file.

Like all FastParse parsers, ClassParse compiles to Javascript using Scala.js and can run in the browser. Below is a short demo of ClassParse in action: upload any .class file full of Java byte code, and it will parse it and print out the names and signatures of all fields and methods define in that class:

If you haven't any .class files around, you can download some examples to try:

You can download the classfiles and directly try them in this demo, or you can download the sources and compile them yourself with javac before uploading the generated class file.

This is available on Maven Central as

"com.lihaoyi" %%% "classparse" % "0.4.3"

Performance


FastParse will never be able to compete with hand-written recursive descent parsers for speed. However, it's no slouch either; here's a comparison of FastParse with alternatives, using Parboiled2's JSON parsing benchmark, which parses a ~21,500 line JSON file:

BenchmarkScoreError
fastparse80.536± 0.942
fastparse-no-trace89.873± 0.875
argonaut164.092± 2.869
json4s-jackson285.637± 3.954
json4s-native142.964± 2.076
parboiled287.586± 1.176
scala-parser-combinators0.976± 0.018
spray-json189.784± 2.825

These numbers are the number of iterations/second of parsing a sample test.json file, averaged over 200 runs. As you can see, the FastParse based parser comes within a factor of 4 of the fastest hand written parser (Jackson), is just as fast as the Parboiled2 based parser (slightly faster/slower depending if full tracing is enabled), and is almost 100x faster than the scala-parser-combinators library.

In exchange for the perf hit compared to hand-rolled solutions, you get the short, super-simple parser definition, and excellent error free error reporting. While for super-high-performance use cases you may still want a hand-rolled parser, for many ad-hoc situations a FastParse parser would do just fine. Remember, even at "only" 89 iterations per second that is still parsing 1,900,000 lines of JSON every second!

A similar speed ratio can be seen in parsing a sample Scala file using FastParse, Parboiled2 and Scalac's inbuilt hand-written Scala-language parser:

BenchmarkScoreError
fastparse320.715.4
fastparse-no-trace434.723.4
parboiled213547.97
scalac4888113

These numbers are the number of iterations over 30 seconds, average of 4 runs, with 2 runs of warmup (discarded). FastParse performs worse here, at 11.5x slower than Scalac's in-built parser, and 3x slower than the equivalent Parboiled2-based parser. Depending on what you're doing, that may or may not be a problem: ScalaParse still makes progress at 57,027 lines of Scala per second, which despite being slower than the others is still blazing fast.

Improving Performance

There are many ways to improve performance of your FastParse parsers. If you study the example parsers included in the repo, those already have many of these techniques applied, and if you follow the same style you'll probably do ok. Nevertheless, here are some concrete tips:

Startup Performance

FastParse performs a lot of it's optimizations on-startup; this usually does not matter, e.g. if you're building a server or other long-running process, but sometimes it does. Particularly if you are using Scala.js, which has a combination of lower performance and higher expectations (Users hate waiting!)

To improve startup times, here are a few tips:

Debugging Parsers


The vast majority of your time working with FastParse, your parsers will be incorrect. This is almost by definition, because once your parser is correct, you'll be done and can go do something else with your life!

Even if your parsers are correct, often you'll find yourself parsing broken input:

No matter what goes wrong, someone will need to figure it out and fix it. Thus FastParse puts a lot of effort into making working with broken parsers and input as easy as possible.

Let's take an example Parser:

object Foo{
  import fastparse.all._
  val plus = P( "+" )
  val num = P( CharIn('0' to '9').rep(1) ).!.map(_.toInt)
  val side = P( "(" ~ expr ~ ")" | num )
  val expr: P[Int] = P( side ~ plus ~ side ).map{case (l, r) => l + r}
}

This is a simple parser that parses some basic arithmetic expressions: 1+2, (1+2)+(3+4), etc. It's simpler than Math parser shown at the top of the page, as it does not handle multiple operators in a row e.g. 1+2+3+4, nor does it handle operator precedence. Nevertheless it will be enough to show how error handling and debugging works.

If we run the parser on a bad input, though, we get this:

Foo.expr.parse("(1+(2+3x))+4"),
"""Failure(("(" ~ expr ~ ")" | num):1:1 ..."(1+(2+3x))")"""

As you can see, the error message is pretty generic: "i tried to parse "(" ~ expr ~ ")" or a num at index 0". Why does it tell us that?

Using Cuts

The answer is that as far as FastParse knows, you could have wanted either the "(" ~ expr ~ ")" or the num at that position, and it doesn't know which one. Thus even though it starts off parsing a paren, when that branch eventually fails (it tries to parse a ")" at index 7, but finds a "x") it backtracks out of the "(" ~ expr ~ ")" parser and then tries to parse num. When that fails, it doesn't know which side was "meant" to succeed, and so it gives up and just tells you both sides failed to parse.

Although FastParse doesn't know which branch was meant to succeed, we know that once we've parsed a "(", it can no longer parse a number! Thus there's no point in backtracking and trying that side of the |. We can tell FastParse this fact by adding Cuts ~/ after "("

object Foo{
  import fastparse.all._
  val plus = P( "+" )
  val num = P( CharIn('0' to '9').rep(1) ).!.map(_.toInt)
  val side = P( "(" ~/ expr ~ ")" | num )
  val expr: P[Int] = P( side ~ plus ~ side ).map{case (l, r) => l + r}
}

Now, once FastParse sees a "(", it can no longer backtrack! Thus it knows that whatever error occurs later, it must be because it failed to parse a ")" and not because num failed. Then the error message becomes much more precise and useful:

Foo.expr.parse("(1+(2+3x))+4"),
"""Failure(")":1:8 ..."x))+4")"""

Using Log

We can add Log calls to make FastParse tell us a lot more about what a parser is doing. For example, if we want to know whenever a side or expr is being attempted, we can add .log() to those to parsers to find out:

object Foo{
  import fastparse.all._
  val plus = P( "+" )
  val num = P( CharIn('0' to '9').rep(1) ).!.map(_.toInt)
  val side = P( "(" ~/ expr ~ ")" | num ).log()
  val expr:P[Int] = P( side ~ plus ~ side ).map{case (l, r) => l + r}.log()
}

Then when you run it on an invalid input:

Foo.expr.parse("(1+(2+3x))+4")

You get a dump of everything the logged parsers are trying to do

+expr:1:1
  +side:1:1
    +expr:1:2
      +side:1:2
      -side:1:2:Success(1:3)
      +side:1:4
        +expr:1:5
          +side:1:5
          -side:1:5:Success(1:6)
          +side:1:7
          -side:1:7:Success(1:8)
        -expr:1:5:Success(1:8)
      -side:1:4:Failure(side:1:4 / ")":1:8 ..."(2+3x))+4", cut)
    -expr:1:2:Failure(expr:1:2 / side:1:4 / ")":1:8 ..."1+(2+3x))+", cut)
  -side:1:1:Failure(side:1:1 / expr:1:2 / side:1:4 / ")":1:8 ..."(1+(2+3x))", cut)
-expr:1:1:Failure(expr:1:1 / side:1:1 / expr:1:2 / side:1:4 / ")":1:8 ..."(1+(2+3x))", cut)

+ is when a parser is started, - is when it finishes with either a success or a failure. In the case of failure, it tells you what the stack was when it failed.

The general strategy for adding .logs is:

  1. Is my parser misbehaving? This is usually obvious from seeing parse failures when there shouldn't be
  2. Are any sub-parsers which I believe should be succeeding/failing/getting-called, aren't? Add logging to the sub-parsers. You can do this at the definition-site of the sub-parsers as shown above, or to the use-site e.g. side.log("SIDE 1") ~ plus ~ side.log("SIDE 2") if the parser is used in multiple places and you only want to log this usage.
  3. Look at the logging, see some parser behaving strangely. Go to 1.

In general, you do not want to add too many .log() calls otherwise the output becomes noisy and verbose and hard to understand. Adding them incrementally helps you narrow down the problem while still keeping the output readable.

Tracing

By default, on failure, FastParse only provides the index and the last parser which failed at that index. This is information FastParse already has and is thus cheap to provide, and often is enough to show what went wrong, and where. If you prefer row & column, you can trivially compute that from the input & index by counting newlines inside the input string.

Often you want something more, though, and for that FastParse provides tracing, as described in the documentation of Parsing Results. By accessing the .traced lazy val on a Failure, FastParse will perform a whole second parse on the original input, starting from the same index, but with additional tracing code to keep track of what's happening. This typically costs ~2x as much as the original parse, so isn't done by default, but it's trivial to ask for it.

For example, this is tracing being done on an invalid input we know will fail:

val Parsed.Failure(expected, idx, extra) = Foo.expr.parse("(1+(2+3x))+4")

We know that this input will fail, because our grammar (defined earlier) does not contain an "x" in it! It only handles numbers and "+" and parentheses. expected gives us the parser that it needed in order to continue (with a readable .toString) and idx the index that it failed at:

> (idx, expected) // Last index and last parser at which it failed
(7, ")")

But there is also the extra result which contains additional information you can use, e.g. the extra.traced result.

> extra.traced.trace // The named parsers in the stack when it failed
expr:1:1 / side:1:1 / expr:1:2 / side:1:4 / (")" | CharIn("0123456789")):1:8 ..."x))+4"

> extra.traced.stack // Same as .trace, but as a List[Frame] rather than String
Vector(
  Frame(0,expr), // (1+(2+3x))+4
  Frame(0,side), // (1+(2+3x))+4
  Frame(1,expr), //  1+(2+3x))+4
  Frame(3,side)  //    (2+3x))+4
)

As you can see, tracing gives us a much more detailed view: every parser in the stack when the parse failed, what indices they were tried at. Apart from getting it as a readable string via .trace, you can also get it as structured data via .stacK in case you want to manipulate it programmatically.

FastParse also provides the .traceParsers value, which tells you every single parser which could have succeeded at the index parsing failed. In thi

> fail.traced.traceParsers // Every parser that could have succeeded at Failure#index
Set(")", CharIn("0123456789"))

Thus, we can see that although FastParse last tried the ")" parser, it earlier also tried the CharIn("0123456789") parser at the same spot. This makes perfect sense: if instead of "x" we had a digit of some kind, parsing could have continued! And we do not need to figure this out ourselves; FastParse knows and can tell you.

Lastly, tracing gives you access to the .fullStack of the failure, which contains every parser in the stack when it failed, not just the ones with names!

Every expression in a fastparse parser is itself a parser: "(" is a parser, "(" ~ expr ~ ")" is a parser, and so on. With .fullStack, we can see in great detail what FastParse was trying to do when it failed:

> extra.traced.fullStack // Every single parser in the stack when it failed
Vector(
  Frame(0,expr), Frame(0,expr),                    Frame(0,side ~ plus ~ side),
  Frame(0,side), Frame(0,"(" ~/ expr ~ ")" | num), Frame(1,"(" ~/ expr ~ ")"),
  Frame(1,expr), Frame(1,expr),                    Frame(3,side ~ plus ~ side),
  Frame(3,side), Frame(3,"(" ~/ expr ~ ")" | num), Frame(7,"(" ~/ expr ~ ")")
)

Instrumenting Parsers

FastParse provides an instrument argument, which you can use to inject a callback that lets you run code before or after every named parse (defined inside a P) is called. This can be used for a wide variety of things. For example, you could use it to count how many times each parser in the example Math parser gets called when you parse a simple arithmetic expression:

val callCount = mutable.Map.empty[String, Int]


val instrumentFunction = (parser: Parser[_], index: Int, continuation: () => Parsed[_]) => {
  callCount(parser.toString) = callCount.getOrElse(parser.toString, 0) + 1
}

expr.parse("((1+1*2)+(3*4*5))/3", instrument = instrumentFunction)

val expectedCallCount = Map(
  "expr" -> 1,
  "addSub" -> 4,
  "divMul" -> 6,
  "factor" -> 10,
  "number" -> 10,
  "parens" -> 3
)
assert(callCount == expectedCallCount)

This is useful as FastParse isn't really profile-able by "normal" profilers: they tend to tell you what kind of parsers are getting called a lot and taking up all your time (e.g. Sequence, Either, ...) but not which of your parsers (e.g. expr, numbers) are getting called and taking up time. By using instrument, you can perform these counts yourself, even keeping track of System.nanoTimes if you want to count the actual time being taken.

The above example runs code before each named P(...) parser is run. The instrument callback also provides a continuation callback as its third argument, which you can use to run the current parser and run code after the parser is run:

val resultCount = mutable.Map.empty[(String, Boolean), Int]
val instrumentFunction = (parser: Parser[_], index: Int, continuation: () => Parsed[_]) => {
  val result = continuation()
  val resultKey = (parser.toString, result.isInstanceOf[Parsed.Success[_]])
  resultCount(resultKey) = resultCount.getOrElse(resultKey, 0) + 1
}

In this case, we are using it to count not just how many times each named P(...) parser is run, but also how many times it succeeds and fails. We can parse a "good" input and see that most of the attempts by named parser succeed:

// Good Parse
expr.parse("((1+1*2)+(3*4*5))/3", instrument = instrumentFunction)

val expectedResultCount = Map(
  ("expr", true) -> 1,
  ("addSub", true) -> 4,
  ("divMul", true) -> 6,
  ("factor", true) -> 10,
  ("number", true) -> 7,
  ("number", false) -> 3,
  ("parens", true) -> 3
)
assert(resultCount == expectedResultCount)

And we can parse invalid input, and see that it results in many of the named parsers failing repeatedly in the course of the parse:

// Bad Parse
resultCount.clear()
expr.parse("((1+1*2)+(3*4*))/3", instrument = instrumentFunction)

val expectedResultCount2 = Map(
  ("expr", false) -> 1,

  ("addSub", true) -> 1,
  ("addSub", false) -> 3,

  ("divMul", true) -> 3,
  ("divMul", false) -> 3,

  ("factor", true) -> 6,
  ("factor", false) -> 3,

  ("number", true) -> 5,
  ("number", false) -> 4,

  ("parens", true) -> 1,
  ("parens", false) -> 3
)
assert(resultCount == expectedResultCount2)

Which is what you would expect, since the parse overall failed! This measurement of pass/fail rates for each parser is a useful tool for understanding how the overall parser behaves at runtime: which individual parsers are being used over and over and are worth optimizing, and which are called only a handful of times and not bothering with. If an individual parser is being called over and over, you could optimize its internals to make it faster, or you could try to refactor the rest of the overall parser so this individual parser gets called fewer times.

In this case we only run code after the parser has completed, but we could easily run code both before and after if we wanted to e.g. measure the time each parser took to complete. You could also put printlns inside your instrument callback to build your own custom logging functionality, if for some reason the default logging was not to your liking.

Instrumenting parsers typically isn't something you want to use on a "production" parser. Apart from slowing the parser down, it also is relatively unsafe: every instrument call is given untyped Parser[_] and Parsed[_] values, which you would need to cast to make any use of. It is generally both safer and more performant to build your parser result using Capture, Map and FlatMap.

Nevertheless, instrument it is still a useful tool to have available in your toolbox if you are unsure of what a parser is doing, and can be used to give you insight into the runtime behavior of your parsers which is hard to get otherwise.

Use Cases

What's the purpose of all this detailed error reporting? The goal is three-fold:

In general, FastParse's error reporting is detailed and structured. As a developer, most of your time spent interacting with your parser is when it is incorrect and throwing errors at you. As a user, most of your time spent interacting with the parser is when your input is incorrect and it is throwing errors at you. This is almost self-evident, since once your parser is correct or your input is correct you're done and go do other things

Thus, FastParse makes an effort to make the error reporting both detailed and structured. This means as a developer you can quickly diagnose problems, and (if you wish to) put in effort to use the structured errors to help your users diagnose problems. That makes life better for everybody.

Comparisons


FastParse differs from all other parser-combinator libraries in the Scala universe, in quite substantial ways:

Internals


FastParse's internals are straightforward. Nonetheless, its design is unlike any other combinator library I've seen: externally immutable, pure-functional parser-combinators with mutable, highly-optimized internals.

Fast Interpreter

FastParse is designed as a fast, immutable interpreter. That means

In theory, it could be possible to perform either compile-time or initialization-time (before actually parsing) optimizations on the parser to improve performance. So far, I have not managed to find a scheme that has a significant improvement at an acceptable cost in terms of complexity. Apart from trivial de-sugarings (e.g. merging together (p1 | p2) | p3 into a single Either node) what you write is what gets run

External Immutabiliy

FastParse presents a pure-functional, immutable external API to the user. That means that you can call Parser[T]#parse and not worry about having to set up neccessary state or instantiating objects. You take a Parser[T], call .parse, and get a Success[T] or Failure

However, immutability poses a challenge: immutability usually involves lots of "copy & update" operations rather than "mutation" operations, and on the JVM that means excessive garbage generation and collection. This is harmful for performance.

Thus FastParse performs some tricks internally to save allocations: the immutable Parser.Success and Parser.Failure result types are actually interfaces hiding Parser.Mutable.Success and Parser.Mutable.Failure implementations, which have entirely mutable fields.

This means that the same Parser.Mutable.Success and Parser.Mutable.Failure objects are shared throughout an entire parsing run, mutated as the parse progresses, while the external user only sees an immutable facade. This also means that a run of the large-and-complex ScalaParse on a hundreds-of-kb source file results in exactly three allocations in all: one Parser.Mutable.Success, one Parser.Mutable.Failure, and one Ctx object holding them together.

Internal Optimizations

FastParse does some things that take advantage of the type-directed nature of the result-aggregation: while Parser[T].rep returns a Parser[Seq[T]] for an arbitrary T, there is a short circuit such that Parser[Unit].rep simple returns Parser[Unit]. This lets the common case of "parsing things, not caring about the result" avoid the allocation, while still allowing you to stick some other type in there (e.g. Any) if you really do care about the bucket-of-Units.

FastParse also takes advantage of the fact that Parsers are immutable. That makes it feasible to make instantiation mildly-expensive, since each one only gets instantiated once rather than per-parse. As an example, CharIn, CharPred and CharsWhile all have their predicate converted to an identical bit-set to make character lookups extremely fast. Similarly, StringIn gets converted into a Trie in order to allow one-pass matching of dozens of strings at the same time.

These operations are not cheap: the bitsets easily take a few KB of memory each, and involve 65k iterations to fill them in. However, since Parsers are immutable, this one-time-cost goes from "ridiculous" to "acceptable". All these internal optimizations are completely opaque to the user, who (apart from performance) never need to think about them.

Parsing Only

Fastparse, unlike libraries like Scodec or Python's Construct, only allows you to define how to parse data from serialized text/bytes into structured data, and not the other way around.

It is probably possible to write a wrapper on top of FastParse that will allow two-directional serialization, but such a thing does not exist right now in the FastParse codebase, and would likely be pretty difficult to design and implement. If you need such a thing you could try writing your own wrapper and (if it works out well) contributing it back to FastParse, or otherwise just use something else like Scodec which has this capability build in.

Abstraction

FastParse's basic parsers are all generic: things like Sequence, Repeat, Optional, etc. are written to work on any sort of input, whether parsing Strings or Array[Byte]. The Parser and Result types are themselves generic and can work on multiple input and output types, hence the rather complex signatures:

trait Parser[+T, Elem, Repr]
sealed trait Parsed[+T, Elem]

From this generic baseline, FastParse defines multiple "Api" modules. Each of these specializes the "generic" FastParse API into something that's more convenient for a task at hand:

Each FastParse API has type aliases that narrow the type for its particular use case: e.g. fastparse.all has aliased:

type Parsed[+T] = core.Parsed[T, String]
type Parser[+T] = core.Parser[T, Char, String]

While fastparse.byte has aliased:

type Parsed[+T] = core.Parsed[T, Array[Byte]]
type Parser[+T] = core.Parser[T, Byte, Array[Byte]]

Thus, you generally can worked directly with the aliases defined in the relevant Api module without worrying about the more generic form, though most of the core parsers are written generically to work in any Api module. If you wished to, you too could write a custom parser that can parse both Strings and Array[Byte], by inheriting from the more generic Parser[+T, Elem, Repr] trait.

Apart from the direct aliases, FastParse also provides alternate names for the "generic versions" of some of the common parsers:

This is for backwards compatibility, and to allow you to use more relevant names when writing your parser (e.g. AnyChar instead of AnyElem).

Change Log


0.4.3

0.4.2

0.4.1

0.4.0

0.3.7

0.3.6

0.3.5

0.3.4

0.3.3

0.3.2

0.3.1

0.2.1

0.2.0

0.1.7

0.1.6