Beyond Liskov: Type Safe Equality in Scala

Posted 2019-12-15

How an Optimizing Compiler Works Standardizing IO Interfaces for Scala Libraries

The Scala language, following its Java heritage, allows you to compare any two values for equality regardless of their respective types. This can be very error prone, since a refactor that changes the type of one of your values may silently result in an equality check that can never return true. This blog post explores how a "safer" equality check is just one of many features that go against the principles that underlie any type system with subtyping, and achieving it requires a fundamental re-thinking of what it means to be "type safe".

About the Author: Haoyi is a software engineer, and the author of many open-source Scala tools such as the Ammonite REPL and the Mill Build Tool. If you enjoyed the contents on this blog, you may also enjoy Haoyi's book Hands-on Scala Programming

The Equality Problem

In Scala, the equality operator == is defined as follows:

abstract class Any{
  final def ==(that: Any): Boolean
}

This is roughly equivalent to the Java definition:

public class Object {
    public boolean equals(Object obj)
}

Any is the base of the Scala type hierarchy, just like Object is the base class of the Java type hierarchy. For most of this post I will be referring to Scala's Any and its type system, but everything said applies just as well to Java as well.

Any being the base of Scala's type hierarchy means that every type extends Any:

Scala's type hierarchy is a hierarchy because it includes sub-typing: the idea that a type is a child of another type, perhaps with additional methods, constraints, or functionality. The idea of subtyping should be familiar to anyone with programming experience, given the prevalence of Object Oriented languages which make heavy use of this concept.

Because == is a method on Any that takes an argument of Any, that means it is not a compile error == on any two values, whether they have a chance of being equal:

@ def validateUserId1(x: Int, y: Int) = {
    println(x == y)
  }

@ validateUserId1(1, 1)
true

@ validateUserId1(1, 123)
false

@ validateUserId1(123, 123)
true

Or not:

@ def validateUserId2(x: Int, y: String) = {
    println(x == y)
  }

@ validateUserId2(1, "one")
false

@ validateUserId2(1, "1")
false

@ validateUserId2(123, "123")
false

The validateUserId2 case seems a bit egregious: isn't the point of a type system to help avoid this kind of bugs, where we are writing meaningless code? It always returns false, so the programmer likely made an error.

While it might not make sense to write a function like validateUserId2 directly, it is quite easy to accidentally end up with a function like validateUserId2 during a refactor, where e.g. you are converting your user IDs from Ints to Strings, or vice versa. Isn't a type system meant to help catch these cases, where we are performing a refactor, and give us a compile error rather than letting broken half-refactored code proceed until runtime before giving a meaningless result?

What are Types and Subtypes?

Before we go any further, it makes sense to first discuss what we mean by a "type" and a "sub-type". One way of looking at "types" is as things you know about an expression in your program:

Is the expression a String?
Is the expression a specific string, like "hello"?
Is the expression an Int?
Is the expression a specific Int, like 123?
Is the expression a Seq? A specific Seq like List or Vector?
Is the expression an Any, meaning it could be any of the above?

If we know a expression is a String, we know that it cannot be an Int, and vice versa. These types tell you known facts about a particular expression in your program, allowing the compiler to e.g. check and make sure you're not passing in a String where an Int is required.

Sub-types mean that there are different levels of things you know about a expression in your program:

Any means a expression could be anything: an integer, string, list, etc.
Int means a expression could be any integer from -2^31 to +2^31
123 means a expression can only be the single integer, 123

In this case, we can consider the "type" as something that restricts an expression to a particular range of values, while a sub-type is simply a range of values that is a sub-set of the range allowed by another type.

Sub-typing is a natural way of representing the varying amounts of knowledge you have about various expressions within your program. Different compilers may have different type and subtype hierarchies: e.g. Java doesn't have a type for the exact number 123 while Scala does:

@ val x: 123 = 123
x: Int = 123

@ val x: 123 = 456
cmd54.sc:1: type mismatch;
 found   : Int(456)
 required: 123
val x: 123 = 456
             ^
Compilation Failed

While neither Java nor Scala have a type for arbitrary integer ranges like "any integer from 123 to 456" that are present in some optimizing compilers.

Although it is possible to define a type system that works without subtyping at all, by doing so you lose out in a lot of flexibility in modeling the varying amounts of knowledge you have about expressions within your program.

The Liskov Substitution Principle

The Liskov Substitution Principle (LSP) is stated as follows:

Let phi(x) be a property provable about objects x of type T. Then phi(y) should be true for objects y of type S where S is a subtype of T.

...if S is a subtype of T, then objects of type T may be replaced with objects of type S without altering any of the desirable properties of the program (correctness, task performed, etc.).

Wikipedia

That means if I have a method that "does things" on a Seq, I should be able to have it do things on a List or a Vector and have the method work fine, given that List and Vector are subtypes of Seq. This is the fundamental principle that underlies all type systems with sub-typing, but should mostly be a formalization of your existing intuition for how sub-typing works.

In dynamicl languages without a type-checker, violations of LSP often result in runtime exceptions such as NameErrors (Python) or TypeErrors (Javascript) when you accidentally try to perform an operation (e.g. call a method) on a value that does not support it.

One point of ambiguity is what is meant by "desirable properties of a program". This is a point that we will re-visit again later.

Strawman: Exact-Type Matches

Coming back to our unsafe-equality problem, one immediate solution is to try and enforce == to only be callable when the exact same type is available on both the left and the right are exactly equal. One issue is that this constraint is not expressible in the Scala language, or in any similar language with subtyping. But let us imagine we could define such a function, with made up syntax:

abstract class Any{
  final def ==(that: SelfType): Boolean
}

Where SelfType is a magical parameter type that only allows the same type as the receiver of the method. That would allow our validateUserId1 method to work:

@ def validateUserId1(x: Int, y: Int) = {
    println(x == y)
  }

@ validateUserId1(1, 1)
true

@ validateUserId1(1, 123)
false

But reject validateUserId2:

@ def validateUserId2(x: Int, y: String) = {
    println(x == y)
  }
// Compile error, cannot compare `Int == String`

This immediately bumps into an issue where you realize that values of different types can sometimes be equal. For example, we may write a function to check whether a List and a Vector are equal:

@ def checkEqualSequences(existing: List[Int], value: Vector[Int]) = {
    println(existing == value)
  }
defined function checkEqualSequences

@ checkEqualSequences(List(1, 2, 3), Vector(1, 2, 3))
true

List and Vector are two separate types that share a common supertype, Seq, but are otherwise unrelated. In fact, every Scala type shares a common supertype: Any! The fact that List and Vector share a Seq supertype is not any different from String and Int sharing a Any supertype.

Furthermore, this bumps into a second issue: such a SelfType constraint violates the Liskov Substitution Principle! To see why, let us consider the following definition:

@ def validateAny(x: Any, y: Any) = {
    println(x == y)
  }

@ validateAny(1, 1)
true

@ validateAny(1, "1")
false

Here, validateAny is allowed, because both x and y are of the same type Any, satisfying the SelfType constraint. However, earlier on we saw that narrowing the types of x and y to Int and String (respectively) caused a compile error:

@ def validateUserId2(x: Int, y: String) = {
    println(x == y)
  }
// Compile error, cannot compare `Int == String`

But we have defined that Int and String to be sub-types of Any, and thus we should expect to be able to substitute x: Any, y: Any with x: Int, y: String "without altering any of the desirable properties of the program". While it is arguable whether these "desirable properties" are runtime properties or compile-time properties, it can be argued that going from a program that compiles to a program that does not compile at all would be altering a desirable property!

Thus such a SelfType constraint violates the Liskov Substitution Principle. That isn't in itself the end of the world, but definitely begs the question: so what if we violate it? It turns out there are several existing parts of the Scala language that violate LSP in the same way, which we will look at shortly

Existing Violations of Liskov

Consider the following function:

@ def isInt(x: Any) = x match{
    case i: Int => true
    case _ => false
  }
defined function isInt

@ isInt(1)
res39: Boolean = true

@ isInt("omg")
res40: Boolean = false

isInt takes an argument of type Any, pattern matches on it, and if it's an Int it prints out something different than if it's something else. Perhaps not code you will write every day, or even code you want to write, but not unheard of.

Now consider the following modification:

@ def isInt2(x: String) = x match{
    case i: Int => true
    case _ => false
  }
cmd41.sc:2: scrutinee is incompatible with pattern type;
 found   : Int
 required: String
  case i: Int => true
          ^
Compilation Failed

Here, we have changed the argument type from x: Any to x: String, and our program has stopped compiling! Again, this seemingly violates LSP: because String is a subtype of Any, we expect to be able to substitute it without losing any "desirable property", but in the process our code has stopped compiling!

However, if we wrote our code in a slightly different way, using isInstanceOf rather than pattern matching, this compile error goes away, and both x: Any and x: String are allowed:

@ def isInt3(x: Any) = {
    if (x.isInstanceOf[Int]) true
    else false
  }
defined function isInt3

@ isInt3(1)
res45: Boolean = true

@ isInt3("one")
res46: Boolean = false

@ def isInt4(x: String) = {
    if (x.isInstanceOf[Int]) true
    else false
  }
defined function isInt4

@ isInt4("one")
res48: Boolean = false

isInt and isInt3 are semantically identical: they take in an Any, and may print one of two things depending on whether you pass in an Int. isInt2 fails to compile, but isInt4 always prints one thing and not the other, even though isInt2 and isInt4 are semantically identical.

This comparison should hopefully clarify what the isInt2 compile error is about: it's not complaining because pattern matching to check if a known String is an Int is invalid, rather it is complaining because pattern matching to check if a known String is an Int is valid, but we already know the answer!

The compile error is due to the presence of known-trivial, rather than due to a fundamental type error that violates LSP, which is why by shuffling things around we can write the same code in a way that the compiler's is-code-trivial-or-not heuristic cannot figure out the code. Our code can compile, run perfectly fine at runtime, and produce a valid result, albeit one that is always false because a String is never an Int.

Safety Beyond Liskov Substitution

The fundamental issue here is that the Liskov Substitution Principle ensures safety in one way:

For all possible values of a type, we are allowed to perform the desired operation on it

However, we have another definition of "safety" here, that is illustrated with our problem with equality and pattern matching above:

For all possible values of a type, the result of the desired operation must not be trivially known

Our complaint about universal equality being defined on Any being "unsafe" isn't because it allows us to perform equality checks that are invalid, but rather than it allows us to perform equality checks that are valid but trivial. If we return to our original "invalid" example, validateUserId2:

@ def validateUserId2(x: Int, y: String) = {
    println(x == y)
  }

@ validateUserId2(1, "one")
false

@ validateUserId2(1, "1")
false

@ validateUserId2(123, "123")
false

We never see two values that cannot be compared, but there are plenty of values that when-compared always return false: while false is a perfectly good return value, always returning false likely indicates a bug in our program logic.

Of course, there are also plenty of possible equality checks that always return true; those also are likely to indicate a bug in our program logic!

In general, whenever narrowing types turns a "okay" code snippet to "should be an error" code snippet, that tells you this isn't anything related to the traditional definition of type-safety. Traditional type-safety is based on LSP, and in LSP narrowing the type of an expression in valid code should always preserve the validity of the code. If we're narrowing some types, and the code becomes invalid, then something else is amiss.

This insight tells us a few important things:

It's a Heuristic, not a Law

Since our complaint about universal equality is about triviality, rather than about a traditional sense of type-safety, it depends entirely on what we consider "trivial". For example, earlier we considered that this code is okay:

@ def validateUserId1(x: Int, y: Int) = {
    println(x == y)
  }

The == in validateUserId1 is okay because we do not know what it will return.

But this code was not:

@ def validateUserId2(x: Int, y: String) = {
    println(x == y)
  }

Because the == in validateUserId2 will always return false so we think that's probably a bug.

But what about the following code?

@ def validateUserIdOuter(z: Int) = {
    def validateUserIdInner(x: Int, y: Int) = {
      println(x == y)
    }
    validateUserIdInner(z, z)
  }

A "trivial" analysis will tell us that always returns true, and so is probably a bug as well. If I saw code like this in a code review, I would be certain that it was a mistake.

It is valuable to find (and reject) such trivial code that always produces a known result, but it is impossible to find all such scenarios due to the Halting Problem. Thus trying to reject trivial ==s, trivial isInstanceOfs or pattern matches, etc. is entirely reliant on how much analysis the compiler can do up-front: more is better, but it's theoretically impossible to get to complete coverage.

It's not just Universal Equality

There are a lot of possible operations where "trivial" code is probably a bug. Earlier we saw trivial ==s, trivial isInstanceOfs and pattern-matches, but there are others. For example, Seq.contains:

@ def checkInSeq(xs: Seq[Any], y: Any) = {
    xs.contains(y)
  }

checkInSeq seems fine: it may return true or false depending on what is passed in. The next snippet on the other hand:

@ def checkInSeq2(xs: Seq[Int], y: String) = {
    xs.contains(y)
  }

checkInSeq2 seems invalid, because we know it always returns false. And it became "invalid" when we narrowed the types from xs: Seq[Any], y: Any to xs: Seq[Int], y: String, just like the validateUserId functions we saw earlier.

There are many other non-contains non-==-related cases

(x: Option[T]).isEmpty // This is normal
Some(1).isEmpty // This always returns false, just like Seq.contains!
None.isEmpty  // This always returns true

(x: Int) == 1 // This is normal 
2 == 1 // This always returns false
1 == 1 // This always returns true

(x: Seq[Int]).isInstanceOf[List[Int]] // this is typical
(x: Vector[Int]).isInstanceOf[List[Int]] // this always returns false, just like Seq.contains

(x: Any) match{case i: Int => true case _ => false} // this is typical
(x: String) match{case i: Int => true case _ => false} // this always returns false, just like Seq.contains, and is blocked by the compiler, but violates LSP!!!
((x: String): Any) match{case i: Int => true case _ => false} // the fact that the above fails to compile may not sound unreasonable, until you realize that by up-casting to `: Any`, it can compile again!

(x: Any).getClass // this is typical, can return anything
(x: CharSequence).getClass // this is more specific than above, can return fewer things
(x: String).getClass  // this can only ever return classOf[String], 
                      // just like Seq.contains sometimes only ever returns false!

In each set of cases, we have an operation on a wide type that is expected, but the same operations a narrow type trivially results in a single known result, and is probably a bug. All of these are in perfect accordance to LSP, but often indicate bugs or inefficiencies in the program.

At various points in time, I have certainly been bitten by mistakes that look like all these cases above: a function that like validateUserIdInner that looks fine but is only called from one place returning a trivial result, mis-matched isInstanceOf checks that always returned false, or x == y checks which are of the same type but always return false because x and y are trivially always the same two (different) integers.

While universal equality seems to get the most attention, all of these are really the same problem: if we are writing code that produces a trivial result, it is probably a bug and should be flagged or disallowed.

What makes a Trivial Expression?

So far, we have seen that in many cases, the problematic "trivial" expressions are those that can be "trivially" evaluated to a constant, even at compile time. But there is another name for a partial evaluator: a typechecker! As Naftoli Gugenheim has pointed out in discussion, trying to partial evaluate booleans or other constants is nothing more than typechecking with singleton types or constant-folding. Scala already supports that.

Three things are required for identifying such problematic “trivially constant” snippets:

he code in question is non-trivial: i.e. not just a single primitive literal. This is easy to check.
The code in question typechecks to a singleton type: we already have the ability to do this.
The code in question is pure: we do not currently have the ability to track this in the compiler

The only thing we need to add is some rudimentary purity analysis to catch problematic cases of ==, and perhaps some annotations @pure to handle things like .contains, None.isEmpty, 2 == 1, and so on. That would give us all the safety we wanted, while being a much less invasive change than the other proposals I have seen.

(@pure annotations can lie, but then again so can generic types due to erasure. In the end you have to decide who you want to trust)

What about Constant Folding?

One problematic case with prohibiting trivial computations is that sometimes we want them. Consider the following snippet:

def thirdOfCircle = math.Pi * 2 / 3
def timeToWaitInMillis = 5 * 60 * 1000

Here, we may want Math.Pi * 2 / 3 to constant-fold to a single number for performance, but we do not want to write the number directly because math.Pi * 2 / 3 is much more informative than the literal 2.0943951023931953.

One possible solution is to require trivially-constant values to be wrapped in a marker method:

def thirdOfCircle = const{ Math.Pi * 2 / 3 }
def timeToWaitInMillis = const{ 5 * 60 * 1000 }

where const{...} tells the compiler “I know this is a trivial computation, but let me do it”.

Providing such a marker function and making people use it may be an overall boost to readability overall: as a reader, I can immediately identify which portions of the code compute constants, rather than having to run a constant-folder in my head to try and figure out which snippets of code are going to be optimized away into constants.

This way, I get constant folding within const blocks, no constant folding outside, and both the compiler and programmer are 100% clear where constant folding exists and where it doesn’t.

Conclusion

The traditional definition of type-safety, based on types, subtyping and LSP, is deeply ingrained in almost all statically-typed languages. It allows us to prohibit at-compile-time anyone accidentally trying to perform an operation on an expression which might not support it. However, LSP says nothing about another type of issue: that in which a programmer writes code with trivially known results, which is likely to indicate an error or oversight.

Essentially, if you are doing something non-trivial to compute a trivial result, it's probably an programmer error. Universal equality is just one common case of that.

We have seen that this is an issue that goes far beyond just universal equality checks. It thus makes sense that we want a similarly general approach to mitigate this: although the Halting Problem means that it is theoretically impossible to find all such trivial results at compile time, an approach based on finding trivial expressions handle most disparate cases without having special-case handling for == or contains.

In fact, we already have most of the machinery necessary in the compiler: singleton type inference! We just need to extend it slightly to let it serve double-duty catching these trivial expressions.

It is for that reason that I do not think the proposals for Multiversal Equality or an implicit-based invariant Seq.contains are the right way to go: they're each a relatively-specific hack for a relatively-specific manifestation of this problem, neither solving the core issue nor being based upon any strong theoretical foundation. Each adds a lot of additional rules to the relatively simple core concept of subtyping, along with their own edge cases, ergonomic issues, and complexity. Using invariant collections for a type-safe .contains method has also been proposed, but that only works by throwing away sub-typing among the collections hierarchy, and doesn't address the myriad of non-collection-related cases where this problem exists.

The end goal of this should be clear: using the ability to infer singleton types to identify trivial snippets of code that are likely to be programmer errors. We should be able to re-use most of the existing infrastructure around singleton types, without needing invasive changes to the language or core libraries. Adding hints to the core language or type system to assist in the inference isn't out of the question as long as we keep in mind that they are hints to aid in inference, and not a whole separate subsystem on their own.

In a way, knowing that the Halting Problem prevents a 100% general solution is liberating: we know that we will never get to "perfect", and so we only need to provide a level of "good enough" bug-finding that we can improve over time, with a balance of performance, usability and ergonomics.