Haoyi's Programming Blog

12 years of the com.lihaoyi Scala Platform

Posted 2024-06-09

The com.lihaoyi platform is an ecosystem of Scala tools and libraries that allow you to write Scala in a way that is easy, rather than powerful. This post discusses the history of the com.lihaoyi platform, how it came about and evolved over the past decade, and what role I think it may play in the future of the Scala language.

ScalaSql: a New SQL Database Query Library for the com-lihaoyi Scala Ecosystem

Posted 2024-01-15

This blog post discusses the introduction of the ScalaSql query library. Why it is needed, what it brings to the table over the many existing database query libraries in the Scala ecosystem, and what makes it fit nicely into the com-lihaoyi philosophy of "Executable Scala Pseudocode that's Easy, Boring, and Fast"

So, What's So Special About The Mill Scala Build Tool?

Posted 2023-09-18

Mill is a Scala build tool that offers an alternative to the venerable SBT toolchain. Mill aims for simplicity by reusing concepts you are already familiar with, borrowing ideas from Functional Programming and modern tools like Bazel. Feedback from users of Mill is often surprisingly positive, with people saying it is "intuitive" or feels "just right".

In this post, we'll explore why Mill is such an intuitive build tool. We'll get familiar with Mill, do a deep dive into the core ideas of what Mill is all about, and finally discuss how these core design decisions build up to a surprisingly intuitive and flexible user experience that is much more than the sum of its parts.

com.lihaoyi Scala: Executable Pseudocode that's Easy, Boring, and Fast

Posted 2023-06-11

Python has a reputation as "Executable Pseudocode": code that fits just as easily on a whiteboard during a discussion, as it does in a codebase deploying to production. Scala is a language that can be just as concise a pseudocode as Python, and arguably better at the "executable" part: faster, safer, and with better tooling. This blog post will explore how the Scala libraries from the com.lihaoyi ecosystem allows the use of Scala as Executable Pseudocode, due to their unique design philosophy that stands out amongst the rest of the Scala ecosystem.

How I Self-Published My First Technical Book

Posted 2021-12-13

Last year I wrote and self-published my first technical book Hands-on Scala Programming. It has gone on to be relatively successful, selling thousands copies, both digital and physical. This blog post will explore how the process of writing and publishing Hands-on Scala went: from its inception, to writing, editing, and finally publishing in multiple formats and mediums.

Reflecting on Four Years at Databricks

Posted 2021-09-25

Today marks my 4th year anniversary at my current employer, Databricks. It's been a good run, and this is a great chance to reflect on the journey so far. This post is as much for me to organize my own thoughts as it is for anyone else to read.

This is a bit of change from the normal programming content on this blog, but given that my job is a programming job, writing lots of Scala, I suppose it still fits!

Introducing the com-lihaoyi Github Organization

Posted 2021-02-23

This is a short blog post to introduce the new com-lihaoyi Github organization: a place for all of my most production-ready projects will live to be easily maintained and discovered. We will discuss the status quo, what is changing, and what I hope to achieve with this change.

From First Principles: Why Scala?

Posted 2021-02-05

Scala, first appearing in 2004, is neither an old stalwart nor a new player in the programming language market. This post will discuss the unique combination of features that Scala provides and how it compares to other languages on the market, diving beneath the superficial experience to explore the fundamentals of the language. From this, you will learn why you might consider including Scala as a valuable addition to your programming toolbox.

Message-based Parallelism with Actors

Posted 2020-12-02

This blog post is a chapter 16 excerpt from the book Hands-on Scala Programming

class SimpleUploadActor()(implicit cc: castor.Context) extends castor.SimpleActor[String]{
  def run(msg: String) = {
    val res = requests.post("https://httpbin.org/post", data = msg)
    println("response " + res.statusCode)
  }
}

Snippet 16.1: a simple actor implemented in Scala using the Castor library

Message-based parallelism is a technique that involves splitting your application logic into multiple "actors", each of which can run concurrently, and only interacts with other actors by exchanging asynchronous messages. This style of programming was popularized by the Erlang programming language and the Akka Scala actor library, but the approach is broadly useful and not limited to any particular language or library.

This chapter will introduce the fundamental concepts of message-based parallelism with actors, and how to use them to achieve parallelism in scenarios where the techniques we covered in Chapter 13: Fork-Join Parallelism with Futures cannot be applied. We will first discuss the basic actor APIs, see how they can be used in a standalone use case, and then see how they can be used in more involved multi-actor pipelines. The techniques in this chapter will come in useful later in Chapter 18: Building a Real-time File Synchronizer.

The Death of Hype: What's Next for Scala

Posted 2020-04-10

A recent tweet by a friend of mine noted how the public interest in the Scala programming language seems to have plateaued or waned, which matches my feeling of the latest trends and zeitgeist. This blog post will go into why I think that has happened, where Scala stands now, and what the future holds for the Scala community.

Standardizing IO Interfaces for Scala Libraries

Posted 2019-12-28

The latest version of my open source libraries have standardized on two new interfaces - Writable and Readable - that allow efficient streaming data exchange between libraries. This blog post will explore the origin of these two interfaces, what purpose they serve, and how they can enable more efficient inter-operabilty between a wide range of different libraries and frameworks.

Beyond Liskov: Type Safe Equality in Scala

Posted 2019-12-15

The Scala language, following its Java heritage, allows you to compare any two values for equality regardless of their respective types. This can be very error prone, since a refactor that changes the type of one of your values may silently result in an equality check that can never return true. This blog post explores how a "safer" equality check is just one of many features that go against the principles that underlie any type system with subtyping, and achieving it requires a fundamental re-thinking of what it means to be "type safe".

How an Optimizing Compiler Works

Posted 2019-11-09

Optimizing compilers are a mainstay of modern software: allowing a programmer to write code in a language that makes sense to them, while transforming it into a form that makes sense for the underlying hardware to run efficiently. The optimizing compiler's job is to figure out what your input program does, and create an output program that it knows will do the same thing, just faster.

This post will walk through some of the core inference techniques within an optimizing compiler: how to model a program in a way that is easy to work with, infer facts about your program, and using those facts to make your program smaller and faster.

Working with Databases using Scala and Quill

Posted 2019-10-09

Most modern systems are backed by relational databases. This tutorial will walk you through the basis of using a relational database from Scala, using the Quill query library. We will work through small self-contained examples of how to store and query data within a Postgres database, and finish with converting an interactive chat website to use a Postgres database for data storage.

Scraping Websites using Scala and Jsoup

Posted 2019-09-28

Not every website exposes their data through a JSON API: in many cases the HTML page shown to users is all you get. This tutorial will walk you through using Scala to scrape useful information from human-readable HTML pages, unlocking the ability to programmatically extract data from online websites or services that were never designed for programmatic access via an API.

Simple Web and Api Servers with Scala

Posted 2019-09-21

Web and API servers are the backbone of internet systems: they provide the basic interface for computers to interact over a network, especially at the boundary between different companies and organizations. This tutorial will teach you the basics of setting up a simple HTTP server in Scala to serve Web and API requests, and walk you through a complete example of building a simple real-time chat website serving both HTML web pages and JSON API endpoints.

Build your own Programming Language with Scala

Posted 2019-09-11

One strength of Scala is implementing programming languages. Even if your goal is not to implement an entirely new programming language, these techniques are still useful: for writing linters, program analyzers, query engines, and other such tools. This blog post will walk you through the process of implementing a simple programming language in Scala, introduce some of the fundamental concepts involved, and finishing with a working interpreter for a simple programming language.

Easy Parallel Programming with Scala Futures

Posted 2019-09-03

The Scala programming language comes with a Futures API. Futures make parallel programming much easier to handle than working with traditional techniques of threads, locks, and callbacks. This blog post dives into Scala's Futures: how to use them, how they work, and how they can give you much more flexibility to leverage parallelism in your code.

How to create Build Pipelines in Scala

Posted 2019-08-02

Build pipelines are a common pattern, where you have files and assets you want to process but want to do so efficiently and incrementally. Usually that means only re-processing files when they change, and otherwise re-using the already-processed assets as much as possible. This blog post will walk through how to use the Mill build tool to set up these build pipelines, using a real-world use case, and demonstrate the advantages a build pipeline gives you over a naive build script.

How to work with JSON in Scala

Posted 2019-06-26

JSON is one of the most common data interchange formats: a human-readable way of exchanging structured data that is ubiquitous throughout industry. This tutorial will walk you through how to work effectively with JSON data in Scala, walking through a few common workflows on a piece of real-world JSON data.

How to work with HTTP JSON APIs in Scala

Posted 2019-06-15

JSON HTTP APIs have become the standard for any organization exposing parts of their system publicly for external developers to work with. This tutorial will walk you through how to access JSON HTTP APIs in Scala, building up to a simple use case: migrating Github issues from one repository to another using Github's public API.

How to work with Subprocesses in Scala

Posted 2019-06-06

Most complex systems are made of multiple processes: often the tool you need is not easily usable as a library within your program, but can be easily started as a subprocess to accomplish the task you need it to do. This tutorial will walk through how to easily work with such subprocesses from the Scala programming language, to allow you to interact with the rich ecosystem of third-party tools and utilities that subprocesses make available.

How to work with Files in Scala

Posted 2019-06-02

Working with files and the filesystem is one of the most common things you do when programming. This tutorial will walk through how to easily work with files in the Scala programming language, in a way that scales from interactive usage in the REPL, to your first Scala scripts, to usage in a production system or application.

Compact, Streaming Pretty-Printing of Hierarchical Data

Posted 2018-12-29

Pretty-printing hierarchical data into a human-readable form is a common thing to do. While a computer doesn't care exactly how you format the same textual data, a human would want to view data that is nicely laid out and indented, yet compact enough to make full use of the width of the output window and avoid wasting horizontal screen space. This post presents an algorithm which achieves optimal usage of horizontal space, predictable layout, and good runtime characteristics: peak heap usage linear in the width of the output window, the ability to start and stop the pretty-printing to print any portion of it, with total runtime linear in the portion of the structure printed.

Automatic Binary Serialization in uPickle 0.7

Posted 2018-11-19

The latest version 0.7 of the uPickle Scala serialization library lets you easily serialize your Scala values to the binary MessagePack format, in addition to the existing JSON serialization format. This gives you the option of compact, high-performance, binary serialization entirely for free, for any value you were previously JSON serializing. This blog post will explore the benefits of binary serialization, and what uPickle brings to the table that's special.

Fastparse 2: Even Faster Scala Parser Combinators

Posted 2018-10-18

This blog post introduces Fastparse 2, a new major version of the FastParse parser-combinator library for Scala. In exchange for some minor tweaks in the public API FastParse 2 gives you parsers that run 2-4x faster on real-world parsers than FastParse 1. This brings Fastparse - already one of the fastest Scala parsing libraries - close to the speed of hand-written parsers.

This blog post will demonstrate the small difference in usage and large difference in performance between Fastparse 1 and Fastparse 2, explore how the major changes to Fastparse's internals enable such an improvement, and discuss some other improvements in usability that appeared in the transition.

Zero-Overhead Tree Processing with the Visitor Pattern

Posted 2018-05-26

The Visitor Pattern is one of the most mis-understood of the classic design patterns. While it has a reputation as a slightly roundabout technique for doing simple processing on simple trees, it is actually an advanced tool for a specific use case: flexible, streaming, zero-overhead processing of complex data structures. This blog post will dive into what makes the Visitor Pattern special, and why it has a unique place in your toolkit regardless of what language or environment you are programming in.

How To Drive Change as a Software Engineer

Posted 2018-04-25

You have an idea. Your boss is indifferent, your team-mates apprehensive, and that other team whose help you need are dubious. You are an individual-contributor with no direct-reports. You still think it's a good idea, but cannot make it happen alone. What next?

Driving change within an technical organization is hard, especially as someone with no rank or authority, but is a skill that can be learned. If you've ever found yourself with an idea but been unsure how to proceed, this post should hopefully give you a good overview of what it takes to conceive, plan & execute such an effort.

uJson: fast, flexible and intuitive JSON for Scala

Posted 2018-03-31

uJson is a new JSON library for the Scala programming language. It serves as the back-end for the uPickle serializaiton library, but can be used standalone to manipulate JSON in a way that is fast, flexible and intuitive, far more than the existing JSON libraries in the Scala library ecosystem. This post will go over what makes uJson an improvement over other JSON libraries that are available, and why you might consider using uJson and uPickle in your next big project.

Mill: Better Scala Builds

Posted 2018-02-22

Mill is a new build tool for Scala: it compiles your Scala code, packages it, runs it, and caches things to avoid doing unnecessary work. Mill aims to be better than Scala's venerable old SBT build tool, learning from it's mistakes and building upon ideas from functional programming to come up with a build tool that is fast, flexible, and easy to understand and use. This post will explore what makes Mill interesting to a Scala developer who is likely already using SBT

Build Tools as Pure Functional Programs

Posted 2018-01-16

The term "build tool" is used to describe a host of different pieces of software, each with their own features, uses and complexity. It is sometimes hard to see the forest for the trees: in the midst of minifying Javascript, compiling C++, or generating protobuf sources, what is a build tool all about?

In this post, I will argue that fundamentally a build-tool models the same thing as a pure functional program. The correspondence between the two is deep, and in studying it we can get new insights into both build-tooling and functional programming.

So, what's wrong with SBT?

Posted 2017-11-19

SBT is the default build tool for the Scala programming community: you can build Scala using other tools, but the vast majority of the community uses SBT. Despite that, nobody seems to like SBT: people say it's confusing, complicated, and opaque. This post will deeply analyze what exactly it is about SBT that people don't like, so we can build a consensus around the problems and a foundation for how we can make things better in future.

FastParse 1.0: Past, Present & Future

Posted 2017-10-14

Fastparse is a Scala library for parsing strings and bytes into structured data. This lets you easily write a parser for any arbitrary textual data formats (e.g. program source code, JSON, ...) and have the parsers run at an acceptable speed, with great error debuggability and error reporting.

This post goes over the history and motivations of Fastparse, and what to expect with the project's recent 1.0.0 release.

uTest: the Essential Test Framework for Scala

Posted 2017-09-21

uTest is a testing framework for the Scala programming language. uTest aims to be both simple and convenient to use, to allow you to focus on what's most important: your tests and your code. This post will explore what makes uTest interesting, and why you should consider using it to build the test suite in your next Scala project.

How to conduct a good Programming Interview

Posted 2017-08-30

Any software engineer who has ever looked for a job has had the interview experience: being locked in a small room for an hour, asked to write code to solve some arbitrary programming task, while being grilled by one or two interviewers whether or why the code they've written is correct.

You can find lots of material online about how to ace the interview as an interviewee, but very little has been written about how to conduct one as the interviewer. This post will lay out a coherent set of principles and cover many concrete steps you can use to make the most out of the next time you find yourself interviewing a potential software-engineering hire.

Scala Vector operations aren't "Effectively Constant" time

Posted 2017-08-08

One oft-repeated fact is that in the Scala language, immutable Vectors are implemented using a 32-way trees. This makes many operations take O(log32(n)) time, and given a maximum size of 2^31 elements, it never takes more than 6 steps to perform a given operation. They call this "effectively constant" time. This fact has found its way into books, blog posts, StackOverflow answers, and even the official Scala documentation.

While this logic sounds good on the surface, it is totally incorrect, and taking the logic even one or two steps further illustrates why. This post will walk through why such logic is incorrect, explore some of the absurd conclusions we can reach if the above logic is taken to be true, and demonstrate why Scala's Vector operations are not "effectively constant" time.

Principles of Automated Testing

Posted 2017-07-15

Automated testing is a core part of writing reliable software; there's only so much you can test manually, and there's no way you can test things as thoroughly or as conscientiously as a machine. As someone who has spent an inordinate amount of time working on automated testing systems, for both work and open-source projects, this post covers how I think of them. Which distinctions are meaningful and which aren't, which practices make a difference and which don't, building up to a coherent set of principles of how to think about the world of automated testing in any software project.

Scala Scripting: Getting to 1.0

Posted 2017-06-13

Ammonite: Scala Scripting is an open-source project that lets you use the Scala programming language for "scripting" purposes: as an interactive REPL, small scripts, or a systems shell. Scala has traditionally been a "heavy, powerful" language with "heavy, powerful" tools, and Ammonite aims to let you use it for small, simple tasks as well.

While people have been using Ammonite continuously over the last two years it's been in development, I've only recently started tagging Release Candidates (RCs) in the run-up to publishing Ammonite 1.0.0. This post will explore how Ammonite has changed over the years, and what's different now as we approach a 1.0 release.

Warts of the Scala Programming Language

Posted 2017-05-27

Scala is my current favorite general-purpose programming language. However, it definitely has its share of flaws. While some are deep trade-offs in the design of the language, others are trivial, silly issues which cause frustration far beyond their level of sophistication: "warts". This post will explore some of what, in my opinion, are the warts of the Scala programming language, to hopefully raise awareness of their existence as problems and build a desire to fix them in the broader community.

Re-imagining the Online Code Explorer

Posted 2017-03-03

Platforms like Github, Bitbucket, and Phabricator all provide ways of browsing and searching your project's source code online, as part of their larger suite of collaboration features. While their overall platform is rich and valuable, my experience with their online code-explorers has always been mediocre and underwhelming.

This post contrasts the experience of exploring code online with that using offline tools, that many would already be familiar with. How exactly is the online browsing experience inferior? Why is that the case? And is it possible to do better? What would "better" look like?

What's Functional Programming All About?

Posted 2017-01-25

There are many descriptions floating around the internet, trying to explain functional programming in simple terms. Unfortunately, most discuss details only loosely related to functional programming, while others focus on topics that are completely irrelevant. So of course, I had to write my own!

This post is my own understanding of what is the "core" of "functional programming", how it differs from "imperative" programming, and what the main benefits of the approach are. As a worked example, we will use a kitchen recipe as a proxy for the more-abstract kind of logic you find in program source code, to try and make concrete what is normally a very abstract topic. That recipe is one of my favorite recipes available online, Michael Chu's Classic Tiramisu.

Implicit Design Patterns in Scala

Posted 2016-12-26

Apart from the old design patterns from the 1990s, the Scala programming language in 2016 has a whole new set of design patterns that apply to it. These are patterns that you see in Scala code written across different organizations, cultures and communities.

This blog post will describe some of these design patterns around the use of Scala implicits: specifically around the use of implicit parameters. Hopefully this should help document some of the more fundamental patterns around how people use implicit parameters "in the wild", and provide some insight into what can easily be a confusing language feature.

Old Design Patterns in Scala

Posted 2016-11-29

A Design Pattern is something you do over and over when building software, but isn't concrete enough to be made into a helper method, class or other abstraction. The "Gang of Four" Design Patterns book popularized the idea, and discussed in detail many of the design patterns which are common in C++, Java and similar "Object Oriented" languages.

The Scala programming language in 2016 is different from languages common 22 years ago. While some of the traditional design patterns still apply, others have changed significantly, and yet others have been entirely superseded by new language features. This post will explore how some of these old design patterns apply to the Scala programming language.

Benchmarking Scala Collections

Posted 2016-09-29

This post will dive into the runtime characteristics of the Scala collections library, from an empirical point of view. While a lot has been written about the Scala collections from an implementation point of view (inheritance hierarchies, CanBuildFrom, etc...) surprisingly little has been written about how these collections actually behave under use.

Are Lists faster than Vectors for what you're doing, or are Vectors faster than Lists? How much memory can you save by using un-boxed Arrays to store primitives? When you do performance tricks like pre-allocating arrays or using a while-loop instead of a foreach call, how much does it really matter? var l: List or val b: mutable.Buffer? This post will tell you the answers.

Easy Parsing with Parser Combinators

Posted 2016-09-08

Parsing structured text into data structures has always been a pain. If you are like me, you may have wondered: why do all the parsing tools seem to be parser-generators instead of just configurable-parsers? After all, when you look at, say, a 2D physics library like Chipmunk2D, you just get a bunch of classes and functions you can call. In contrast, parsing libraries like YACC or ANTLR often seem to require custom build steps, compile-time source-code-generation, and other confusing things which you never see in most "normal" libraries.

It turns out, simple libraries do exist for parsing, under the name "Parser Combinators". While not as well known, these parser combinator libraries expose a bunch of classes and functions you can use to build a parser in a convenient way: without the boilerplate of hand-written recursive-descent parsing, the fragility of Regexes, or the complexity of code-gen tools like ANTLR. This post will explore one such library, FastParse, and show how parser combinators can make the process of parsing structured text simple, easy and fun.

From first principles: Why I bet on Scala.js

Posted 2016-08-10

Three years ago, I downloaded the nascent Scala.js compiler and tried to use it on a toy project.

Since then, it has matured greatly: the compiler itself is rock-solid. It has a huge ecosystem of libraries. It has a vibrant community, been adopted by some of the largest commercial users of the Scala language, and is playing a key role in shaping evolution of the language. By any measure, it is a success, and I was one of the key people who evangelized it and built foundations for the open-source community and ecosystem that now exists.

However, three years ago in late 2013, when I first got involved in the project, things were different. The compiler was unstable, buggy, slow, and generated incredibly bloated, inefficient Javascript. No ecosystem, no libraries, no tools, nobody using it, no adoption in enterprise. Clearly, a lot of the things people now see as reasons to use Scala.js did not apply back then. So why did I put in the effort I did? This post will explore the reasons why.

Scala Scripting and the 15 Minute Blog Engine

Posted 2016-07-30

The Scala programming language has traditionally been a tool used for building "serious business" systems: compilers, big data, distributed systems or large web applications. With the advent of Scala.js, people are starting to use it for front-end Web work, while the Ammonite-REPL has turned it into a pleasant interactive experience.

This post will explore the new Scala Scripting functionality in the Ammonite project, and use it in the context of creating your own DIY blog engine in 15 minutes. We'll see how it compares to both the status-quo Scala programming experience, other scripting languages like Python or Bash, and what place it can find in your Scala programming toolbox.

Build your own Command Line with ANSI escape codes

Posted 2016-07-02

Everyone is used to programs printing out output in a terminal that scrolls as new text appears, but that's not all your can do: your program can color your text, move the cursor up, down, left or right, or clear portions of the screen if you are going to re-print them later. This is what lets programs like Git implement its dynamic progress indicators, and Vim or Bash implement their editors that let you modify already-displayed text without scrolling the terminal.

There are libraries like Readline, JLine, or the Python Prompt Toolkit that help you do this in various programming languages, but you can also do it yourself. This post will explore the basics of how you can control the terminal from any command-line program, with examples in Python, and how your own code can directly make use of all the special features the terminal has to offer.

Strategic Scala Style: Designing Datatypes

Posted 2016-06-15

When programming in Scala, there are two main ways of avoiding repetition: you can define functions to represent commonly-used procedures or computations, and you can define data-types, e.g. using classes or case classes, to represent commonly-used bundles of data that you tend to pass around or use together.

Lots of people have opinions about functions: they should be "pure", not too long, not be indented more than this much, etc. etc. etc.. Much less has been written about what a good data-type looks like, even though they play just as important a role in your Scala codebase. This post will explore some of the considerations and guidelines I follow when designing the classes and case classes that make up my Scala programs, and how you can apply them to your own Scala code.

Micro-optimizing your Scala code

Posted 2016-05-30

"Micro-optimization" is normally used to describe low-level optimizations that do not change the overall structure of the program; this is as opposed to "high level" optimizations (e.g. choosing efficient algorithms, caching things, or parallelizing things) that often require broader changes to your code. Things like removing intermediate objects to minimize memory allocations, or using bit-sets rather than HashSets to speed up lookups, are examples of micro-optimizations.

Micro-optimization has a bad reputation, and is especially uncommon in the Scala programming language where the community is more interested in other things such as proofs, fancy usage of static types, or distributed systems. Often, it is viewed as a maintainability cost with few benefits. This post will demonstrate the potential benefit of micro-optimizations, and how it can be a valuable technique to have in your toolbox of programming techniques.

Strategic Scala Style: Practical Type Safety

Posted 2016-04-30

This post explores how you can make use of the type-safety of the Scala programming language to help catch the mistakes you make when writing Scala programs.

While Scala is has a compiler that can help you catch errors, and many call it "type-safe", there is in fact a whole range of ways you can write Scala that provide greater- or lesser- amounts of safety. We will discuss various techniques that you can use to shift your code to the "safer" side of the spectrum. We'll consciously ignore the theoretical side of things with it's absolute proofs and logic, and focus on the practical side of how to make the Scala compiler catch more of your dumb bugs.

This is the third in the Strategic Scala Style series (after Principle of Least Power and Conciseness & Names).

Planning Bus Trips with Python & Singapore's Smart Nation APIs

Posted 2016-03-31

The Singapore Smart Nation initiative is a government push to try and improve the efficiency of Singapore, as a country, using technology. Among other projects, there has been a push to make real-time data openly available so that everyone, from individuals to corporations, can use it creatively to make solutions to day-to-day municipal problems.

This post will explore one such dataset: the LTA Data Mall, by Singapore's Land Transport Authority. This dataset provides both offline geographical data on roads & public transport, as well as real-time data on things like bus arrivals and taxis. Using this dataset, the Python programming language, and basic programming and data-science techniques, we will build a trip planner to find the shortest bus commute from A to B, but powered by real data and bounded by real-world limitations.

From registering an API key, fetching data from an endpoint, sanitizing and understanding the data, implementing algorithms like a Breadth First Search or Dijkstra's Algorithm, refining the search, and finally evaluating its ability to plan useful and correct bus trips. You'll get a full tour of the process involved in making good use of public datasets!

Diving Into Other People's Code

Posted 2016-03-15

This post will walk you through an exercise in diving into someone else's code. The goal will be to make an arbitrary change to the code of the Spyder Python IDE, a project I have never touched before in my life, and learn just enough about it to accomplish what I want without getting bogged down. You will learn how to approach problems without the rigour taught in formal education, and instead with guesswork, experimentation, and insight learned in a professional environment. You will see first-hand the joys, sorrows and frustrations trying to navigate the project, culminating in a working (if rough) patch adding a feature to a large, unfamiliar codebase.

Everyone who learns programming has written a pile of code in a bunch of different programs: whether it's implementing algorithms for problem sets, building websites, or making video games. Being a professional software engineer, on the other hand, very often does not involve "writing lots of code"! More often, you spend your time spelunking deep into other peoples code: code you do not understand, did not write, and have possibly never seen before in your life. You have no-one to ask for help, no-one to hear you scream, and yet you have to make forward progress.

What's in a Build Tool?

Posted 2016-03-04

A "Build tool" is a catch-all term that refers to anything that is needed to get a piece of software set up, but isn't needed after that. Different programming communities have a wealth of different tools: some use stalwarts like make, some use loose collections of .sh scripts, some use XML-based tools like Maven or Ant, JSON-based tools like Grunt, or code-based tools like Gulp, Grunt or SBT.

Each of these tools does different things and has a different set of trade-offs associated with them. Given all these different designs, and different things each tool does, what are the common features that build tools provide that people want? How to existing tools stack up against the things that people want build tools to do?

Strategic Scala Style: Conciseness & Names

Posted 2016-02-22

"Naming things" is one of those traditionally "hard problems" in software engineering. The Scala programming language gives you more tools than most languages do to manage names: apart from picking alphanumeric names of arbitrary length, you can also name things using operators, or in many cases not names things at all using language features like apply or the _ placeholder parameter.

However, the fact that code ends up "too concise" is itself one of the most common complaints leveled against the Scala programming language. How can we pick the right balance of verbosity and conciseness, at the right times, to ensure future maintainers of our software do not end up hating us?

Talks I've Given

Posted 2016-02-14

I've given a bunch of talks at meetup groups, industry conferences and academic workshops. Most of them are about my work in the Scala programming language. The actual recordings for these are slightly scattered, over a mix of Youtube videos, Vimeo, and some conference sites.

Here's a consolidated list of their abstracts and videos, most recent first. I'll keep this updated as time goes on.

Strategic Scala Style: Principle of Least Power

Posted 2016-02-14

The Scala language is large and complex, and it provides a variety of tools that a developer can use to do the same thing in a variety of ways. Given the range of possible solutions to every problem, how can a developer choose which one should be used? This is the first in a series of blog posts aiming to provide style guidelines at a "strategic" level. Above the level of "how much whitespace should I use" or camelCase vs PascalCase, it should help a developer working with the Scala language choose from the buffet of possible solutions.

Code Reviewing My Earliest Surviving Program

Posted 2016-01-08

Programmers usually view old, legacy code with a mix of fear and respect. As a professional software engineer, you often have to deal with systems written long before you turned up. You have to dive into them, understand them, review them, and improve them. You often wonder...

Why did they do this

or

What were they thinking?

Sometimes to realize later that the they is in fact, yourself!

Hello World Blog

Posted 2016-01-07

I have a new programming blog, and you're looking at it! I'll post things about programming on it and there'll be a comment box at the bottom of each post if anyone wants to discuss the things I posted. I'm currently working with the Scala, Python and Javascript programming languages, and am interested in static analysis, compilers, web development and developer tools.

I am not the first person to have a blog, nor the first person to have a programming blog, nor is this my first blog. Nevertheless, for me, doing this is a mix of new and old ideas.