Haoyi's Programming Blog

Benchmarking Scala Collections

Posted 2016-09-29

This post will dive into the runtime characteristics of the Scala collections library, from an empirical point of view. While a lot has been written about the Scala collections from an implementation point of view (inheritance hierarchies, CanBuildFrom, etc...) surprisingly little has been written about how these collections actually behave under use.

Are Lists faster than Vectors for what you're doing, or are Vectors faster than Lists? How much memory can you save by using un-boxed Arrays to store primitives? When you do performance tricks like pre-allocating arrays or using a while-loop instead of a foreach call, how much does it really matter? var l: List or val b: mutable.Buffer? This post will tell you the answers.


Easy Parsing with Parser Combinators

Posted 2016-09-08

Parsing structured text into data structures has always been a pain. If you are like me, you may have wondered: why do all the parsing tools seem to be parser-generators instead of just configurable-parsers? After all, when you look at, say, a 2D physics library like Chipmunk2D, you just get a bunch of classes and functions you can call. In contrast, parsing libraries like YACC or ANTLR often seem to require custom build steps, compile-time source-code-generation, and other confusing things which you never see in most "normal" libraries.

It turns out, simple libraries do exist for parsing, under the name "Parser Combinators". While not as well known, these parser combinator libraries expose a bunch of classes and functions you can use to build a parser in a convenient way: without the boilerplate of hand-written recursive-descent parsing, the fragiity of Regexes, or the complexity of code-gen tools like ANTLR. This post will explore one such library, FastParse, and show how parser combinators can make the process of parsing structured text simple, easy and fun.


From first principles: Why I bet on Scala.js

Posted 2016-08-10

Three years ago, I downloaded the nascent Scala.js compiler and tried to use it on a toy project.

Since then, it has matured greatly: the compiler itself is rock-solid. It has a huge ecosystem of libraries. It has a vibrant community, been adopted by some of the largest commercial users of the Scala language, and is playing a key role in shaping evolution of the language. By any measure, it is a success, and I was one of the key people who evangelized it and built foundations for the open-source community and ecosystem that now exists.

However, three years ago in late 2013, when I first got involved in the project, things were different. The compiler was unstable, buggy, slow, and generated incredibly bloated, inefficient Javascript. No ecosystem, no libraries, no tools, nobody using it, no adoption in enterprise. Clearly, a lot of the things people now see as reasons to use Scala.js did not apply back then. So why did I put in the effort I did? This post will explore the reasons why.


Scala Scripting and the 15 Minute Blog Engine

Posted 2016-07-30

The Scala programming language has traditionally been a tool used for building "serious business" systems: compilers, big data, distributed systems or large web applications. With the advent of Scala.js, people are starting to use it for front-end Web work, while the Ammonite-REPL has turned it into a pleasant interactive experience.

This post will explore the new Scala Scripting functionality in the Ammonite project, and use it in the context of creating your own DIY blog engine in 15 minutes. We'll see how it compares to both the status-quo Scala programming experience, other scripting languages like Python or Bash, and what place it can find in your Scala programming toolbox.


Build your own Command Line with ANSI escape codes

Posted 2016-07-02

Everyone is used to programs printing out output in a terminal that scrolls as new text appears, but that's not all your can do: your program can color your text, move the cursor up, down, left or right, or clear portions of the screen if you are going to re-print them later. This is what lets programs like Git implement its dynamic progress indicators, and Vim or Bash implement their editors that let you modify already-displayed text without scrolling the terminal.

There are libraries like Readline, JLine, or the Python Prompt Toolkit that help you do this in various programming languages, but you can also do it yourself. This post will explore the basics of how you can control the terminal from any command-line program, with examples in Python, and how your own code can directly make use of all the special features the terminal has to offer.


Strategic Scala Style: Designing Datatypes

Posted 2016-06-15

When programming in Scala, there are two main ways of avoiding repetition: you can define functions to represent commonly-used procedures or computations, and you can define data-types, e.g. using classes or case classes, to represent commonly-used bundles of data that you tend to pass around or use together.

Lots of people have opinions about functions: they should be "pure", not too long, not be indented more than this much, etc. etc. etc.. Much less has been written about what a good data-type looks like, even though they play just as important a role in your Scala codebase. This post will explore some of the considerations and guidelines I follow when designing the classes and case classes that make up my Scala programs, and how you can apply them to your own Scala code.


Micro-optimizing your Scala code

Posted 2016-05-30

"Micro-optimization" is normally used to describe low-level optimizations that do not change the overall structure of the program; this is as opposed to "high level" optimizations (e.g. choosing efficient algorithms, caching things, or parallelizing things) that often require broader changes to your code. Things like removing intermediate objects to minimize memory allocations, or using bit-sets rather than HashSets to speed up lookups, are examples of micro-optimizations.

Micro-optimization has a bad reputation, and is especially uncommon in the Scala programming language where the community is more interested in other things such as proofs, fancy usage of static types, or distributed systems. Often, it is viewed as a maintainability cost with few benefits. This post will demonstrate the potential benefit of micro-optimizations, and how it can be a valuable technique to have in your toolbox of programming techniques.


Strategic Scala Style: Practical Type Safety

Posted 2016-04-30

This post explores how you can make use of the type-safety of the Scala programming language to help catch the mistakes you make when writing Scala programs.

While Scala is has a compiler that can help you catch errors, and many call it "type-safe", there is in fact a whole range of ways you can write Scala that provide greater- or lesser- amounts of safety. We will discuss various techniques that you can use to shift your code to the "safer" side of the spectrum. We'll consciously ignore the theoretical side of things with it's absolute proofs and logic, and focus on the practical side of how to make the Scala compiler catch more of your dumb bugs.

This is the third in the Strategic Scala Style series (after Principle of Least Power and Conciseness & Names).


Planning Bus Trips with Python & Singapore's Smart Nation APIs

Posted 2016-03-31

The Singapore Smart Nation initiative is a government push to try and improve the efficiency of Singapore, as a country, using technology. Among other projects, there has been a push to make real-time data openly available so that everyone, from individuals to corporations, can use it creatively to make solutions to day-to-day municipal problems.

This post will explore one such dataset: the LTA Data Mall, by Singapore's Land Transport Authority. This dataset provides both offline geographical data on roads & public transport, as well as real-time data on things like bus arrivals and taxis. Using this dataset, the Python programming language, and basic programming and data-science techniques, we will build a trip planner to find the shortest bus commute from A to B, but powered by real data and bounded by real-world limitations.

From registering an API key, fetching data from an endpoint, sanitizing and understanding the data, implementing algorithms like a Breadth First Search or Dijkstra's Algorithm, refining the search, and finally evaluating its ability to plan useful and correct bus trips. You'll get a full tour of the process involved in making good use of public datasets!


Diving Into Other People's Code

Posted 2016-03-15

This post will walk you through an exercise in diving into someone else's code. The goal will be to make an arbitrary change to the code of the Spyder Python IDE, a project I have never touched before in my life, and learn just enough about it to accomplish what I want without getting bogged down. You will learn how to approach problems without the rigour taught in formal education, and instead with guesswork, experimentation, and insight learned in a professional environment. You will see first-hand the joys, sorrows and frustrations trying to navigate the project, culminating in a working (if rough) patch adding a feature to a large, unfamiliar codebase.

Everyone who learns programming has written a pile of code in a bunch of different programs: whether it's implementing algorithms for problem sets, building websites, or making video games. Being a professional software engineer, on the other hand, very often does not involve "writing lots of code"! More often, you spend your time spelunking deep into other peoples code: code you do not understand, did not write, and have possibly never seen before in your life. You have no-one to ask for help, no-one to hear you scream, and yet you have to make forward progress.


What's in a Build Tool?

Posted 2016-03-04

A "Build tool" is a catch-all term that refers to anything that is needed to get a piece of software set up, but isn't needed after that. Different programming communities have a wealth of different tools: some use stalwarts like make, some use loose collections of .sh scripts, some use XML-based tools like Maven or Ant, JSON-based tools like Grunt, or code-based tools like Gulp, Grunt or SBT.

Each of these tools does different things and has a different set of trade-offs associated with them. Given all these different designs, and different things each tool does, what are the common features that build tools provide that people want? How to existing tools stack up against the things that people want build tools to do?


Strategic Scala Style: Conciseness & Names

Posted 2016-02-22

"Naming things" is one of those traditionally "hard problems" in software engineering. The Scala programming language gives you more tools than most languages do to manage names: apart from picking alphanumeric names of arbitrary length, you can also name things using operators, or in many cases not names things at all using language features like apply or the _ placeholder parameter.

However, the fact that code ends up "too concise" is itself one of the most common complaints leveled against the Scala programming language. How can we pick the right balance of verbosity and conciseness, at the right times, to ensure future maintainers of our software do not end up hating us?


Talks I've Given

Posted 2016-02-14

I've given a bunch of talks at meetup groups, industry conferences and academic workshops. Most of them are about my work in the Scala programming language. The actual recordings for these are slightly scattered, over a mix of Youtube videos, Vimeo, and some conference sites.

Here's a consolidated list of their abstracts and videos, most recent first. I'll keep this updated as time goes on.


Strategic Scala Style: Principle of Least Power

Posted 2016-02-14

The Scala language is large and complex, and it provides a variety of tools that a developer can use to do the same thing in a variety of ways. Given the range of possible solutions to every problem, how can a developer choose which one should be used? This is the first in a series of blog posts aiming to provide style guidelines at a "strategic" level. Above the level of "how much whitespace should I use" or camelCase vs PascalCase, it should help a developer working with the Scala language choose from the buffet of possible solutions.


Code Reviewing My Earliest Surviving Program

Posted 2016-01-08

Programmers usually view old, legacy code with a mix of fear and respect. As a professional software engineer, you often have to deal with systems written long before you turned up. You have to dive into them, understand them, review them, and improve them. You often wonder...

Why did they do this


What were they thinking?

Sometimes to realize later that the they is in fact, yourself!


Hello World Blog

Posted 2016-01-07

I have a new programming blog, and you're looking at it! I'll post things about programming on it and there'll be a comment box at the bottom of each post if anyone wants to discuss the things I posted. I'm currently working with the Scala, Python and Javascript programming languages, and am interested in static analysis, compilers, web development and developer tools.

I am not the first person to have a blog, nor the first person to have a programming blog, nor is this my first blog. Nevertheless, for me, doing this is a mix of new and old ideas.