This post will dive into the runtime characteristics of the Scala collections library, from an empirical point of view. While a lot has been written about the Scala collections from an implementation point of view (inheritance hierarchies, CanBuildFrom, etc...) surprisingly little has been written about how these collections actually behave under use.
Lists faster than
Vectors for what you're doing, or are
Vectors faster than
Lists? How much memory can you save by using un-boxed
Arrays to store primitives? When you do performance tricks like pre-allocating arrays or using a
while-loop instead of a
foreach call, how much does it really matter?
var l: List or
val b: mutable.Buffer? This post will tell you the answers.
Parsing structured text into data structures has always been a pain. If you are like me, you may have wondered: why do all the parsing tools seem to be parser-generators instead of just configurable-parsers? After all, when you look at, say, a 2D physics library like Chipmunk2D, you just get a bunch of classes and functions you can call. In contrast, parsing libraries like YACC or ANTLR often seem to require custom build steps, compile-time source-code-generation, and other confusing things which you never see in most "normal" libraries.
It turns out, simple libraries do exist for parsing, under the name "Parser Combinators". While not as well known, these parser combinator libraries expose a bunch of classes and functions you can use to build a parser in a convenient way: without the boilerplate of hand-written recursive-descent parsing, the fragiity of Regexes, or the complexity of code-gen tools like ANTLR. This post will explore one such library, FastParse, and show how parser combinators can make the process of parsing structured text simple, easy and fun.Comments
Three years ago, I downloaded the nascent Scala.js compiler and tried to use it on a toy project.
Since then, it has matured greatly: the compiler itself is rock-solid. It has a huge ecosystem of libraries. It has a vibrant community, been adopted by some of the largest commercial users of the Scala language, and is playing a key role in shaping evolution of the language. By any measure, it is a success, and I was one of the key people who evangelized it and built foundations for the open-source community and ecosystem that now exists.
The Scala programming language has traditionally been a tool used for building "serious business" systems: compilers, big data, distributed systems or large web applications. With the advent of Scala.js, people are starting to use it for front-end Web work, while the Ammonite-REPL has turned it into a pleasant interactive experience.
This post will explore the new Scala Scripting functionality in the Ammonite project, and use it in the context of creating your own DIY blog engine in 15 minutes. We'll see how it compares to both the status-quo Scala programming experience, other scripting languages like Python or Bash, and what place it can find in your Scala programming toolbox.Comments
Everyone is used to programs printing out output in a terminal that scrolls as new text appears, but that's not all your can do: your program can color your text, move the cursor up, down, left or right, or clear portions of the screen if you are going to re-print them later. This is what lets programs like Git implement its dynamic progress indicators, and Vim or Bash implement their editors that let you modify already-displayed text without scrolling the terminal.
There are libraries like Readline, JLine, or the Python Prompt Toolkit that help you do this in various programming languages, but you can also do it yourself. This post will explore the basics of how you can control the terminal from any command-line program, with examples in Python, and how your own code can directly make use of all the special features the terminal has to offer.Comments
When programming in Scala, there are two main ways of avoiding repetition: you can define functions to represent commonly-used procedures or computations, and you can define data-types, e.g. using
case classes, to represent commonly-used bundles of data that you tend to pass around or use together.
Lots of people have opinions about functions: they should be "pure", not too long, not be indented more than this much, etc. etc. etc.. Much less has been written about what a good data-type looks like, even though they play just as important a role in your Scala codebase. This post will explore some of the considerations and guidelines I follow when designing the
case classes that make up my Scala programs, and how you can apply them to your own Scala code.
"Micro-optimization" is normally used to describe low-level optimizations that do not change the overall structure of the program; this is as opposed to "high level" optimizations (e.g. choosing efficient algorithms, caching things, or parallelizing things) that often require broader changes to your code. Things like removing intermediate objects to minimize memory allocations, or using bit-sets rather than
HashSets to speed up lookups, are examples of micro-optimizations.
Micro-optimization has a bad reputation, and is especially uncommon in the Scala programming language where the community is more interested in other things such as proofs, fancy usage of static types, or distributed systems. Often, it is viewed as a maintainability cost with few benefits. This post will demonstrate the potential benefit of micro-optimizations, and how it can be a valuable technique to have in your toolbox of programming techniques.Comments
This post explores how you can make use of the type-safety of the Scala programming language to help catch the mistakes you make when writing Scala programs.
While Scala is has a compiler that can help you catch errors, and many call it "type-safe", there is in fact a whole range of ways you can write Scala that provide greater- or lesser- amounts of safety. We will discuss various techniques that you can use to shift your code to the "safer" side of the spectrum. We'll consciously ignore the theoretical side of things with it's absolute proofs and logic, and focus on the practical side of how to make the Scala compiler catch more of your dumb bugs.Comments
The Singapore Smart Nation initiative is a government push to try and improve the efficiency of Singapore, as a country, using technology. Among other projects, there has been a push to make real-time data openly available so that everyone, from individuals to corporations, can use it creatively to make solutions to day-to-day municipal problems.
This post will explore one such dataset: the LTA Data Mall, by Singapore's Land Transport Authority. This dataset provides both offline geographical data on roads & public transport, as well as real-time data on things like bus arrivals and taxis. Using this dataset, the Python programming language, and basic programming and data-science techniques, we will build a trip planner to find the shortest bus commute from A to B, but powered by real data and bounded by real-world limitations.
From registering an API key, fetching data from an endpoint, sanitizing and understanding the data, implementing algorithms like a Breadth First Search or Dijkstra's Algorithm, refining the search, and finally evaluating its ability to plan useful and correct bus trips. You'll get a full tour of the process involved in making good use of public datasets!Comments
This post will walk you through an exercise in diving into someone else's code. The goal will be to make an arbitrary change to the code of the Spyder Python IDE, a project I have never touched before in my life, and learn just enough about it to accomplish what I want without getting bogged down. You will learn how to approach problems without the rigour taught in formal education, and instead with guesswork, experimentation, and insight learned in a professional environment. You will see first-hand the joys, sorrows and frustrations trying to navigate the project, culminating in a working (if rough) patch adding a feature to a large, unfamiliar codebase.
Everyone who learns programming has written a pile of code in a bunch of different programs: whether it's implementing algorithms for problem sets, building websites, or making video games. Being a professional software engineer, on the other hand, very often does not involve "writing lots of code"! More often, you spend your time spelunking deep into other peoples code: code you do not understand, did not write, and have possibly never seen before in your life. You have no-one to ask for help, no-one to hear you scream, and yet you have to make forward progress.Comments
A "Build tool" is a catch-all term that refers to anything that is needed to get a piece of software set up, but isn't needed after that. Different programming communities have a wealth of different tools: some use stalwarts like
make, some use loose collections of
.sh scripts, some use XML-based tools like Maven or Ant, JSON-based tools like Grunt, or code-based tools like Gulp, Grunt or SBT.
Each of these tools does different things and has a different set of trade-offs associated with them. Given all these different designs, and different things each tool does, what are the common features that build tools provide that people want? How to existing tools stack up against the things that people want build tools to do?Comments
"Naming things" is one of those traditionally "hard problems" in software engineering. The Scala programming language gives you more tools than most languages do to manage names: apart from picking alphanumeric names of arbitrary length, you can also name things using operators, or in many cases not names things at all using language features like
apply or the
_ placeholder parameter.
However, the fact that code ends up "too concise" is itself one of the most common complaints leveled against the Scala programming language. How can we pick the right balance of verbosity and conciseness, at the right times, to ensure future maintainers of our software do not end up hating us?Comments
I've given a bunch of talks at meetup groups, industry conferences and academic workshops. Most of them are about my work in the Scala programming language. The actual recordings for these are slightly scattered, over a mix of Youtube videos, Vimeo, and some conference sites.
Here's a consolidated list of their abstracts and videos, most recent first. I'll keep this updated as time goes on.Comments
The Scala language is large and complex, and it provides a variety of tools that a developer can use to do the same thing in a variety of ways. Given the range of possible solutions to every problem, how can a developer choose which one should be used? This is the first in a series of blog posts aiming to provide style guidelines at a "strategic" level. Above the level of "how much whitespace should I use" or camelCase vs PascalCase, it should help a developer working with the Scala language choose from the buffet of possible solutions.Comments
Programmers usually view old, legacy code with a mix of fear and respect. As a professional software engineer, you often have to deal with systems written long before you turned up. You have to dive into them, understand them, review them, and improve them. You often wonder...
Why did they do this
What were they thinking?
Sometimes to realize later that the they is in fact, yourself!Comments
I am not the first person to have a blog, nor the first person to have a programming blog, nor is this my first blog. Nevertheless, for me, doing this is a mix of new and old ideas.Comments