Build pipelines are a common pattern, where you have files and assets you want to process but want to do so efficiently and incrementally. Usually that means only re-processing files when they change, and otherwise re-using the already-processed assets as much as possible. This blog post will walk through how to use the Mill build tool to set up these build pipelines, using a real-world use case, and demonstrate the advantages a build pipeline gives you over a naive build script.Comments
JSON is one of the most common data interchange formats: a human-readable way of exchanging structured data that is ubiquitous throughout industry. This tutorial will walk you through how to work effectively with JSON data in Scala, walking through a few common workflows on a piece of real-world JSON data.Comments
JSON HTTP APIs have become the standard for any organization exposing parts of their system publicly for external developers to work with. This tutorial will walk you through how to access JSON HTTP APIs in Scala, building up to a simple use case: migrating Github issues from one repository to another using Github's public API.Comments
Most complex systems are made of multiple processes: often the tool you need is not easily usable as a library within your program, but can be easily started as a subprocess to accomplish the task you need it to do. This tutorial will walk through how to easily work with such subprocesses from the Scala programming language, to allow you to interact with the rich ecosystem of third-party tools and utilities that subprocesses make available.Comments
Working with files and the filesystem is one of the most common things you do when programming. This tutorial will walk through how to easily work with files in the Scala programming language, in a way that scales from interactive usage in the REPL, to your first Scala scripts, to usage in a production system or application.Comments
Pretty-printing hierarchical data into a human-readable form is a common thing to do. While a computer doesn't care exactly how you format the same textual data, a human would want to view data that is nicely laid out and indented, yet compact enough to make full use of the width of the output window and avoid wasting horizontal screen space. This post presents an algorithm which achieves optimal usage of horizontal space, predictable layout, and good runtime characteristics: peak heap usage linear in the width of the output window, the ability to start and stop the pretty-printing to print any portion of it, with total runtime linear in the portion of the structure printed.Comments
The latest version 0.7 of the uPickle Scala serialization library lets you easily serialize your Scala values to the binary MessagePack format, in addition to the existing JSON serialization format. This gives you the option of compact, high-performance, binary serialization entirely for free, for any value you were previously JSON serializing. This blog post will explore the benefits of binary serialization, and what uPickle brings to the table that's special.Comments
This blog post introduces Fastparse 2, a new major version of the FastParse parser-combinator library for Scala. In exchange for some minor tweaks in the public API FastParse 2 gives you parsers that run 2-4x faster on real-world parsers than FastParse 1. This brings Fastparse - already one of the fastest Scala parsing libraries - close to the speed of hand-written parsers.
This blog post will demonstrate the small difference in usage and large difference in performance between Fastparse 1 and Fastparse 2, explore how the major changes to Fastparse's internals enable such an improvement, and discuss some other improvements in usability that appeared in the transition.Comments
The Visitor Pattern is one of the most mis-understood of the classic design patterns. While it has a reputation as a slightly roundabout technique for doing simple processing on simple trees, it is actually an advanced tool for a specific use case: flexible, streaming, zero-overhead processing of complex data structures. This blog post will dive into what makes the Visitor Pattern special, and why it has a unique place in your toolkit regardless of what language or environment you are programming in.Comments
You have an idea. Your boss is indifferent, your team-mates apprehensive, and that other team whose help you need are dubious. You are an individual-contributor with no direct-reports. You still think it's a good idea, but cannot make it happen alone. What next?
Driving change within an technical organization is hard, especially as someone with no rank or authority, but is a skill that can be learned. If you've ever found yourself with an idea but been unsure how to proceed, this post should hopefully give you a good overview of what it takes to conceive, plan & execute such an effort.Comments
uJson is a new JSON library for the Scala programming language. It serves as the back-end for the uPickle serializaiton library, but can be used standalone to manipulate JSON in a way that is fast, flexible and intuitive, far more than the existing JSON libraries in the Scala library ecosystem. This post will go over what makes uJson an improvement over other JSON libraries that are available, and why you might consider using uJson and uPickle in your next big project.Comments
Mill is a new build tool for Scala: it compiles your Scala code, packages it, runs it, and caches things to avoid doing unnecessary work. Mill aims to be better than Scala's venerable old SBT build tool, learning from it's mistakes and building upon ideas from functional programming to come up with a build tool that is fast, flexible, and easy to understand and use. This post will explore what makes Mill interesting to a Scala developer who is likely already using SBTComments
In this post, I will argue that fundamentally a build-tool models the same thing as a pure functional program. The correspondence between the two is deep, and in studying it we can get new insights into both build-tooling and functional programming.Comments
SBT is the default build tool for the Scala programming community: you can build Scala using other tools, but the vast majority of the community uses SBT. Despite that, nobody seems to like SBT: people say it's confusing, complicated, and opaque. This post will deeply analyze what exactly it is about SBT that people don't like, so we can build a consensus around the problems and a foundation for how we can make things better in future.Comments
Fastparse is a Scala library for parsing strings and bytes into structured data. This lets you easily write a parser for any arbitrary textual data formats (e.g. program source code, JSON, ...) and have the parsers run at an acceptable speed, with great error debuggability and error reporting.
This post goes over the history and motivations of Fastparse, and what to expect with the project's recent 1.0.0 release.Comments
uTest is a testing framework for the Scala programming language. uTest aims to be both simple and convenient to use, to allow you to focus on what's most important: your tests and your code. This post will explore what makes uTest interesting, and why you should consider using it to build the test suite in your next Scala project.Comments
Any software engineer who has ever looked for a job has had the interview experience: being locked in a small room for an hour, asked to write code to solve some arbitrary programming task, while being grilled by one or two interviewers whether or why the code they've written is correct.
You can find lots of material online about how to ace the interview as an interviewee, but very little has been written about how to conduct one as the interviewer. This post will lay out a coherent set of principles and cover many concrete steps you can use to make the most out of the next time you find yourself interviewing a potential software-engineering hire.Comments
One oft-repeated fact is that in the Scala language, immutable Vectors are implemented using a 32-way trees. This makes many operations take
O(log32(n)) time, and given a maximum size of
2^31 elements, it never takes more than
6 steps to perform a given operation. They call this "effectively constant" time. This fact has found its way into books, blog posts, StackOverflow answers, and even the official Scala documentation.
While this logic sounds good on the surface, it is totally incorrect, and taking the logic even one or two steps further illustrates why. This post will walk through why such logic is incorrect, explore some of the absurd conclusions we can reach if the above logic is taken to be true, and demonstrate why Scala's Vector operations are not "effectively constant" time.Comments
Automated testing is a core part of writing reliable software; there's only so much you can test manually, and there's no way you can test things as thoroughly or as conscientiously as a machine. As someone who has spent an inordinate amount of time working on automated testing systems, for both work and open-source projects, this post covers how I think of them. Which distinctions are meaningful and which aren't, which practices make a difference and which don't, building up to a coherent set of principles of how to think about the world of automated testing in any software project.Comments
Ammonite: Scala Scripting is an open-source project that lets you use the Scala programming language for "scripting" purposes: as an interactive REPL, small scripts, or a systems shell. Scala has traditionally been a "heavy, powerful" language with "heavy, powerful" tools, and Ammonite aims to let you use it for small, simple tasks as well.
While people have been using Ammonite continuously over the last two years it's been in development, I've only recently started tagging Release Candidates (RCs) in the run-up to publishing Ammonite 1.0.0. This post will explore how Ammonite has changed over the years, and what's different now as we approach a 1.0 release.Comments
Scala is my current favorite general-purpose programming language. However, it definitely has its share of flaws. While some are deep trade-offs in the design of the language, others are trivial, silly issues which cause frustration far beyond their level of sophistication: "warts". This post will explore some of what, in my opinion, are the warts of the Scala programming language, to hopefully raise awareness of their existence as problems and build a desire to fix them in the broader community.Comments
Platforms like Github, Bitbucket, and Phabricator all provide ways of browsing and searching your project's source code online, as part of their larger suite of collaboration features. While their overall platform is rich and valuable, my experience with their online code-explorers has always been mediocre and underwhelming.
This post contrasts the experience of exploring code online with that using offline tools, that many would already be familiar with. How exactly is the online browsing experience inferior? Why is that the case? And is it possible to do better? What would "better" look like?Comments
There are many descriptions floating around the internet, trying to explain functional programming in simple terms. Unfortunately, most discuss details only loosely related to functional programming, while others focus on topics that are completely irrelevant. So of course, I had to write my own!
This post is my own understanding of what is the "core" of "functional programming", how it differs from "imperative" programming, and what the main benefits of the approach are. As a worked example, we will use a kitchen recipe as a proxy for the more-abstract kind of logic you find in program source code, to try and make concrete what is normally a very abstract topic. That recipe is one of my favorite recipes available online, Michael Chu's Classic Tiramisu.Comments
Apart from the old design patterns from the 1990s, the Scala programming language in 2016 has a whole new set of design patterns that apply to it. These are patterns that you see in Scala code written across different organizations, cultures and communities.
This blog post will describe some of these design patterns around the use of Scala implicits: specifically around the use of implicit parameters. Hopefully this should help document some of the more fundamental patterns around how people use implicit parameters "in the wild", and provide some insight into what can easily be a confusing language feature.Comments
A Design Pattern is something you do over and over when building software, but isn't concrete enough to be made into a helper method, class or other abstraction. The "Gang of Four" Design Patterns book popularized the idea, and discussed in detail many of the design patterns which are common in C++, Java and similar "Object Oriented" languages.
The Scala programming language in 2016 is different from languages common 22 years ago. While some of the traditional design patterns still apply, others have changed significantly, and yet others have been entirely superseded by new language features. This post will explore how some of these old design patterns apply to the Scala programming language.Comments
This post will dive into the runtime characteristics of the Scala collections library, from an empirical point of view. While a lot has been written about the Scala collections from an implementation point of view (inheritance hierarchies, CanBuildFrom, etc...) surprisingly little has been written about how these collections actually behave under use.
Lists faster than
Vectors for what you're doing, or are
Vectors faster than
Lists? How much memory can you save by using un-boxed
Arrays to store primitives? When you do performance tricks like pre-allocating arrays or using a
while-loop instead of a
foreach call, how much does it really matter?
var l: List or
val b: mutable.Buffer? This post will tell you the answers.
Parsing structured text into data structures has always been a pain. If you are like me, you may have wondered: why do all the parsing tools seem to be parser-generators instead of just configurable-parsers? After all, when you look at, say, a 2D physics library like Chipmunk2D, you just get a bunch of classes and functions you can call. In contrast, parsing libraries like YACC or ANTLR often seem to require custom build steps, compile-time source-code-generation, and other confusing things which you never see in most "normal" libraries.
It turns out, simple libraries do exist for parsing, under the name "Parser Combinators". While not as well known, these parser combinator libraries expose a bunch of classes and functions you can use to build a parser in a convenient way: without the boilerplate of hand-written recursive-descent parsing, the fragility of Regexes, or the complexity of code-gen tools like ANTLR. This post will explore one such library, FastParse, and show how parser combinators can make the process of parsing structured text simple, easy and fun.Comments
Three years ago, I downloaded the nascent Scala.js compiler and tried to use it on a toy project.
Since then, it has matured greatly: the compiler itself is rock-solid. It has a huge ecosystem of libraries. It has a vibrant community, been adopted by some of the largest commercial users of the Scala language, and is playing a key role in shaping evolution of the language. By any measure, it is a success, and I was one of the key people who evangelized it and built foundations for the open-source community and ecosystem that now exists.
The Scala programming language has traditionally been a tool used for building "serious business" systems: compilers, big data, distributed systems or large web applications. With the advent of Scala.js, people are starting to use it for front-end Web work, while the Ammonite-REPL has turned it into a pleasant interactive experience.
This post will explore the new Scala Scripting functionality in the Ammonite project, and use it in the context of creating your own DIY blog engine in 15 minutes. We'll see how it compares to both the status-quo Scala programming experience, other scripting languages like Python or Bash, and what place it can find in your Scala programming toolbox.Comments
Everyone is used to programs printing out output in a terminal that scrolls as new text appears, but that's not all your can do: your program can color your text, move the cursor up, down, left or right, or clear portions of the screen if you are going to re-print them later. This is what lets programs like Git implement its dynamic progress indicators, and Vim or Bash implement their editors that let you modify already-displayed text without scrolling the terminal.
There are libraries like Readline, JLine, or the Python Prompt Toolkit that help you do this in various programming languages, but you can also do it yourself. This post will explore the basics of how you can control the terminal from any command-line program, with examples in Python, and how your own code can directly make use of all the special features the terminal has to offer.Comments
When programming in Scala, there are two main ways of avoiding repetition: you can define functions to represent commonly-used procedures or computations, and you can define data-types, e.g. using
case classes, to represent commonly-used bundles of data that you tend to pass around or use together.
Lots of people have opinions about functions: they should be "pure", not too long, not be indented more than this much, etc. etc. etc.. Much less has been written about what a good data-type looks like, even though they play just as important a role in your Scala codebase. This post will explore some of the considerations and guidelines I follow when designing the
case classes that make up my Scala programs, and how you can apply them to your own Scala code.
"Micro-optimization" is normally used to describe low-level optimizations that do not change the overall structure of the program; this is as opposed to "high level" optimizations (e.g. choosing efficient algorithms, caching things, or parallelizing things) that often require broader changes to your code. Things like removing intermediate objects to minimize memory allocations, or using bit-sets rather than
HashSets to speed up lookups, are examples of micro-optimizations.
Micro-optimization has a bad reputation, and is especially uncommon in the Scala programming language where the community is more interested in other things such as proofs, fancy usage of static types, or distributed systems. Often, it is viewed as a maintainability cost with few benefits. This post will demonstrate the potential benefit of micro-optimizations, and how it can be a valuable technique to have in your toolbox of programming techniques.Comments
This post explores how you can make use of the type-safety of the Scala programming language to help catch the mistakes you make when writing Scala programs.
While Scala is has a compiler that can help you catch errors, and many call it "type-safe", there is in fact a whole range of ways you can write Scala that provide greater- or lesser- amounts of safety. We will discuss various techniques that you can use to shift your code to the "safer" side of the spectrum. We'll consciously ignore the theoretical side of things with it's absolute proofs and logic, and focus on the practical side of how to make the Scala compiler catch more of your dumb bugs.Comments
The Singapore Smart Nation initiative is a government push to try and improve the efficiency of Singapore, as a country, using technology. Among other projects, there has been a push to make real-time data openly available so that everyone, from individuals to corporations, can use it creatively to make solutions to day-to-day municipal problems.
This post will explore one such dataset: the LTA Data Mall, by Singapore's Land Transport Authority. This dataset provides both offline geographical data on roads & public transport, as well as real-time data on things like bus arrivals and taxis. Using this dataset, the Python programming language, and basic programming and data-science techniques, we will build a trip planner to find the shortest bus commute from A to B, but powered by real data and bounded by real-world limitations.
From registering an API key, fetching data from an endpoint, sanitizing and understanding the data, implementing algorithms like a Breadth First Search or Dijkstra's Algorithm, refining the search, and finally evaluating its ability to plan useful and correct bus trips. You'll get a full tour of the process involved in making good use of public datasets!Comments
This post will walk you through an exercise in diving into someone else's code. The goal will be to make an arbitrary change to the code of the Spyder Python IDE, a project I have never touched before in my life, and learn just enough about it to accomplish what I want without getting bogged down. You will learn how to approach problems without the rigour taught in formal education, and instead with guesswork, experimentation, and insight learned in a professional environment. You will see first-hand the joys, sorrows and frustrations trying to navigate the project, culminating in a working (if rough) patch adding a feature to a large, unfamiliar codebase.
Everyone who learns programming has written a pile of code in a bunch of different programs: whether it's implementing algorithms for problem sets, building websites, or making video games. Being a professional software engineer, on the other hand, very often does not involve "writing lots of code"! More often, you spend your time spelunking deep into other peoples code: code you do not understand, did not write, and have possibly never seen before in your life. You have no-one to ask for help, no-one to hear you scream, and yet you have to make forward progress.Comments
A "Build tool" is a catch-all term that refers to anything that is needed to get a piece of software set up, but isn't needed after that. Different programming communities have a wealth of different tools: some use stalwarts like
make, some use loose collections of
.sh scripts, some use XML-based tools like Maven or Ant, JSON-based tools like Grunt, or code-based tools like Gulp, Grunt or SBT.
Each of these tools does different things and has a different set of trade-offs associated with them. Given all these different designs, and different things each tool does, what are the common features that build tools provide that people want? How to existing tools stack up against the things that people want build tools to do?Comments
"Naming things" is one of those traditionally "hard problems" in software engineering. The Scala programming language gives you more tools than most languages do to manage names: apart from picking alphanumeric names of arbitrary length, you can also name things using operators, or in many cases not names things at all using language features like
apply or the
_ placeholder parameter.
However, the fact that code ends up "too concise" is itself one of the most common complaints leveled against the Scala programming language. How can we pick the right balance of verbosity and conciseness, at the right times, to ensure future maintainers of our software do not end up hating us?Comments
I've given a bunch of talks at meetup groups, industry conferences and academic workshops. Most of them are about my work in the Scala programming language. The actual recordings for these are slightly scattered, over a mix of Youtube videos, Vimeo, and some conference sites.
Here's a consolidated list of their abstracts and videos, most recent first. I'll keep this updated as time goes on.Comments
The Scala language is large and complex, and it provides a variety of tools that a developer can use to do the same thing in a variety of ways. Given the range of possible solutions to every problem, how can a developer choose which one should be used? This is the first in a series of blog posts aiming to provide style guidelines at a "strategic" level. Above the level of "how much whitespace should I use" or camelCase vs PascalCase, it should help a developer working with the Scala language choose from the buffet of possible solutions.Comments
Programmers usually view old, legacy code with a mix of fear and respect. As a professional software engineer, you often have to deal with systems written long before you turned up. You have to dive into them, understand them, review them, and improve them. You often wonder...
Why did they do this
What were they thinking?
Sometimes to realize later that the they is in fact, yourself!Comments
I am not the first person to have a blog, nor the first person to have a programming blog, nor is this my first blog. Nevertheless, for me, doing this is a mix of new and old ideas.Comments