Haoyi's Programming Blog

Table of Contents

Working with Databases using Scala and Quill

Posted 2019-10-09
Scraping Websites using Scala and JsoupHow an Optimizing Compiler Works

Most modern systems are backed by relational databases. This tutorial will walk you through the basis of using a relational database from Scala, using the Quill query library. We will work through small self-contained examples of how to store and query data within a Postgres database, and finish with converting an interactive chat website to use a Postgres database for data storage.


About the Author: Haoyi is a software engineer, and the author of many open-source Scala tools such as the Ammonite REPL and the Mill Build Tool. If you enjoyed the contents on this blog, you may also enjoy Haoyi's book Hands-on Scala Programming


Getting Started

Ammonite

To begin with, we will be using the Ammonite Scala REPL:

$ sudo sh -c '(echo "#!/usr/bin/env sh" && curl -L https://github.com/lihaoyi/Ammonite/releases/download/1.7.4/2.13-1.7.4) > /usr/local/bin/amm && chmod +x /usr/local/bin/amm' && amm
Loading...
Welcome to the Ammonite Repl 1.7.4
(Scala 2.13.1 Java 11.0.2)
If you like Ammonite, please support our development at www.patreon.com/lihaoyi
@

Library Setup

On top of Ammonite, we will also install the Quill database query library, and the OpenTable Embedded Postgres database library:

@ import $ivy.{
    `io.getquill::quill-jdbc:3.4.10`,
    `org.postgresql:postgresql:42.2.8`,
    `com.opentable.components:otj-pg-embedded:0.13.1`
  }
...
https://repo1.maven.org/maven2/com/opentable/components/otj-pg-embedded/0.13.1/otj-pg-embedded-0.13.…
  100.0% [##########] 50.5 MiB (154.1 KiB / s)
https://repo1.maven.org/maven2/com/opentable/components/otj-pg-embedded/0.13.1/otj-pg-embedded-0.13.…
  100.0% [##########] 50.5 MiB (142.7 KiB / s)

OpenTable's Embedded Postgres library will make it convenient to spin up a small Postgres database for us to work with, without needing to mess around with system libraries to install a Postgres database globally. For production use, you will likely be using a separately managed database server, perhaps hosted on Heroku Postgres or Amazon RDS.

To begin with we will start out test database:

@ import com.opentable.db.postgres.embedded.EmbeddedPostgres

@ val server = EmbeddedPostgres.builder().setPort(5432).start()
server: EmbeddedPostgres = EmbeddedPG-dee4bbc5-7e4e-4559-afb8-10155ecff124

Next, we will set up a Quill database context connected to our in-memory database:

@ {
  import io.getquill._
  import com.zaxxer.hikari.{HikariConfig, HikariDataSource}
  val pgDataSource = new org.postgresql.ds.PGSimpleDataSource()
  pgDataSource.setUser("postgres")
  val config = new HikariConfig()
  config.setDataSource(pgDataSource)
  val ctx = new PostgresJdbcContext(LowerCase, new HikariDataSource(config))
  import ctx._
  }

ctx will be our primary interface to the Postgres database through Quill. The HikariConfig and PGSimpleDataSource classes expose most of the things you can configure when connecting to the Postgres database, e.g. setUser above, but for now we will mostly stick the defaults.

PG-CLI

In a separate terminal, we will be using the open source PG-CLI tool to directly connect to our test Postgres database:

After installation, you can connect PG-CLI to the local test database via pgcli -U postgres:

$ pgcli -U postgres
Server: PostgreSQL 10.6
Version: 2.1.0
Chat: https://gitter.im/dbcli/pgcli
Mail: https://groups.google.com/forum/#!forum/pgcli
Home: http://pgcli.com
postgres@/tmp:postgres>

Sample Data

As sample data for this exercise, we will be using a Postgres version of the popular world.sql dataset:

--
-- PostgreSQL port of the MySQL "World" database.
--
-- The sample data used in the world database is Copyright Statistics
-- Finland, http://www.stat.fi/worldinfigures.
--

CREATE TABLE IF NOT EXISTS city (
    id integer NOT NULL,
    name varchar NOT NULL,
    countrycode character(3) NOT NULL,
    district varchar NOT NULL,
    population integer NOT NULL
);

CREATE TABLE IF NOT EXISTS country (
...

This can be downloaded from:

And imported into our postgres database using PG-CLI:

postgres@/tmp:postgres> \i world.sql;

PG-CLI uses the standard \d command to list tables and columns:

postgres@/tmp:postgres> \d
+----------+-----------------+--------+----------+
| Schema   | Name            | Type   | Owner    |
|----------+-----------------+--------+----------|
| public   | city            | table  | postgres |
| public   | country         | table  | postgres |
| public   | countrylanguage | table  | postgres |
+----------+-----------------+--------+----------+
postgres@/tmp:postgres> \d city
+-------------+-------------------+-------------+
| Column      | Type              | Modifiers   |
|-------------+-------------------+-------------|
| id          | integer           |  not null   |
| name        | character varying |  not null   |
| countrycode | character(3)      |  not null   |
| district    | character varying |  not null   |
| population  | integer           |  not null   |
+-------------+-------------------+-------------+
Indexes:
    "city_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "country" CONSTRAINT "country_capital_fkey" FOREIGN KEY (capital) REFERENCES city(id)
postgres@/tmp:postgres> \d country
+----------------+-------------------+-------------+
| Column         | Type              | Modifiers   |
|----------------+-------------------+-------------|
| code           | character(3)      |  not null   |
| name           | character varying |  not null   |
| continent      | character varying |  not null   |
| region         | character varying |  not null   |
| surfacearea    | real              |  not null   |
| indepyear      | smallint          |             |
| population     | integer           |  not null   |
| lifeexpectancy | real              |             |
| gnp            | numeric(10,2)     |             |
| gnpold         | numeric(10,2)     |             |
| localname      | character varying |  not null   |
| governmentform | character varying |  not null   |
| headofstate    | character varying |             |
| capital        | integer           |             |
| code2          | character(2)      |  not null   |
+----------------+-------------------+-------------+
Indexes:
    "country_pkey" PRIMARY KEY, btree (code)
Foreign-key constraints:
    "country_capital_fkey" FOREIGN KEY (capital) REFERENCES city(id)
Referenced by:
    TABLE "countrylanguage" CONSTRAINT "countrylanguage_countrycode_fkey" FOREIGN KEY (countrycode) R
postgres@/tmp:postgres> \d countrylanguage
+-------------+-------------------+-------------+
| Column      | Type              | Modifiers   |
|-------------+-------------------+-------------|
| countrycode | character(3)      |  not null   |
| language    | character varying |  not null   |
| isofficial  | boolean           |  not null   |
| percentage  | real              |  not null   |
+-------------+-------------------+-------------+
Indexes:
    "countrylanguage_pkey" PRIMARY KEY, btree (countrycode, language)
Foreign-key constraints:
    "countrylanguage_countrycode_fkey" FOREIGN KEY (countrycode) REFERENCES country(code)

Mapping Tables

Quill expects tables to be represented by Scala case classes, with individual columns within each table mapping to primitive data types. As a first approximation, the mapping is as follows:

Postgres Scala
real Float, Double
boolean Boolean
integer, smallint, bigint Int, Long
character(n), character varying String
numeric(n,m) java.math.BigDecimal

This particular Postgres database uses a lowercase variable name convention, which doesn't quite match Scala's PascalCase for class names and camelCase for field names. To map between these two conventions, we had earlier defined our Quill ctx to use a LowerCase name mapper:

val ctx = new PostgresJdbcContext(LowerCase, new HikariDataSource(config))

There are also SnakeCase, UpperCase, or CamelCase name mappers you can use if your database uses those naming conventions.

Defining the case classes representing each table is straightforward. Here is the mapping from table schema to case class for the city table:

postgres@/tmp:postgres> \d city
+-------------+-------------------+-------------+
| Column      | Type              | Modifiers   |
|-------------+-------------------+-------------|
| id          | integer           |  not null   |
| name        | character varying |  not null   |
| countrycode | character(3)      |  not null   |
| district    | character varying |  not null   |
| population  | integer           |  not null   |
+-------------+-------------------+-------------+
case class City(id: Int, 
                name: String, 
                countryCode: String, 
                district: String, 
                population: Int)

The country table:

postgres@/tmp:postgres> \d country
+----------------+-------------------+-------------+
| Column         | Type              | Modifiers   |
|----------------+-------------------+-------------|
| code           | character(3)      |  not null   |
| name           | character varying |  not null   |
| continent      | character varying |  not null   |
| region         | character varying |  not null   |
| surfacearea    | real              |  not null   |
| indepyear      | smallint          |             |
| population     | integer           |  not null   |
| lifeexpectancy | real              |             |
| gnp            | numeric(10,2)     |             |
| gnpold         | numeric(10,2)     |             |
| localname      | character varying |  not null   |
| governmentform | character varying |  not null   |
| headofstate    | character varying |             |
| capital        | integer           |             |
| code2          | character(2)      |  not null   |
+----------------+-------------------+-------------+  
case class Country(code: String, 
                   name: String, 
                   continent: String, 
                   region: String,
                   surfaceArea: Double,
                   indepYear: Option[Int],
                   population: Int,
                   lifeExpectancy: Option[Double],
                   gnp: Option[scala.math.BigDecimal],
                   gnpold: Option[scala.math.BigDecimal],
                   localName: String,
                   governmentForm: String,
                   headOfState: Option[String],
                   capital: Option[Int],
                   code2: String)

Optional values which do not have a Postgres not null flag set are modeled using Scala Option[T]s

Lastly, here is the mapping case class for the countrylanguage table:

postgres@/tmp:postgres> \d countrylanguage
+-------------+-------------------+-------------+
| Column      | Type              | Modifiers   |
|-------------+-------------------+-------------|
| countrycode | character(3)      |  not null   |
| language    | character varying |  not null   |
| isofficial  | boolean           |  not null   |
| percentage  | real              |  not null   |
+-------------+-------------------+-------------+
case class CountryLanguage(countrycode: String, 
                           language: String,
                           isOfficial: Boolean,
                           percentage: Double)

You can then try out basic queries to fetch the various tables and map them to the Scala case classes:

@ ctx.run(query[City])
cmd25.sc:1: SELECT x.id, x.name, x.countrycode, x.district, x.population FROM city x
val res25 = ctx.run(query[City])
                   ^
res25: List[City] = List(
  City(1, "Kabul", "AFG", "Kabol", 1780000),
  City(2, "Qandahar", "AFG", "Qandahar", 237500),
  City(3, "Herat", "AFG", "Herat", 186800),
  City(4, "Mazar-e-Sharif", "AFG", "Balkh", 127800),
  City(5, "Amsterdam", "NLD", "Noord-Holland", 731200),
  City(6, "Rotterdam", "NLD", "Zuid-Holland", 593321),
  City(7, "Haag", "NLD", "Zuid-Holland", 440900),
...
@ ctx.run(query[Country])
cmd26.sc:1: SELECT x.code, x.name, x.continent, x.region, x.surfacearea, x.indepyear, x.population, x.lifeexpectancy, x.gnp, x.gnpold, x.localname, x.governmentform, x.headofstate, x.capital, x.code2 FROM country x
val res26 = ctx.run(query[Country])
                   ^
res26: List[Country] = List(
  Country(
    "AFG",
    "Afghanistan",
    "Asia",
    "Southern and Central Asia",
    652090.0,
    Some(1919),
    22720000,
    Some(45.9000015),
    Some(5976.00),
...
@ ctx.run(query[CountryLanguage])
cmd27.sc:1: SELECT x.countrycode, x.language, x.isofficial, x.percentage FROM countrylanguage x
val res27 = ctx.run(query[CountryLanguage])
                   ^
res27: List[CountryLanguage] = List(
  CountryLanguage("AFG", "Pashto", true, 52.4000015),
  CountryLanguage("NLD", "Dutch", true, 95.5999985),
  CountryLanguage("ANT", "Papiamento", true, 86.1999969),
  CountryLanguage("ALB", "Albaniana", true, 97.9000015),
  CountryLanguage("DZA", "Arabic", true, 86.0),
  CountryLanguage("ASM", "Samoan", true, 90.5999985),
...

Note that dumping the entire database table to in-memory case class objects is probably not something you want to do to your large production database, but on this small sample database it's unlikely to cause issues.

Notice also how in each case, when compiling your code before the command runs, Quill prints out the exact SQL query that is being prepared and will be executed at runtime. This can be useful if your Quill query is misbehaving and you are trying to figure out what exactly it is trying to do. You can always go to your PG-CLI console and enter that same SQL directly into the console:

postgres@/tmp:postgres> SELECT x.countrycode, x.language, x.isofficial, x.percentage FROM countrylang
 uage x;
+---------------+---------------------------+--------------+--------------+
| countrycode   | language                  | isofficial   | percentage   |
|---------------+---------------------------+--------------+--------------|
| AFG           | Pashto                    | True         | 52.4         |
| NLD           | Dutch                     | True         | 95.6         |
| ANT           | Papiamento                | True         | 86.2         |
| ALB           | Albaniana                 | True         | 97.9         |
| DZA           | Arabic                    | True         | 86.0         |
| ASM           | Samoan                    | True         | 90.6         |
| AND           | Spanish                   | False        | 44.6         |
| AGO           | Ovimbundu                 | False        | 37.2         |
| AIA           | English                   | True         | 0.0          |
| ATG           | Creole English            | False        | 95.7         |
| ARE           | Arabic                    | True         | 42.0         |
...

Querying Data

Now that we have a simple sample database set up, and have configured Quill to work with it, we can now start performing more interesting queries.

Filtering

A Scala .filter translates into a SQL WHERE clause. You can use that to find individual entries by name:

@ ctx.run(query[City].filter(_.name == "Singapore"))
cmd6.sc:1: SELECT x1.id, x1.name, x1.countrycode, x1.district, x1.population FROM city x1 WHERE x1.name = 'Singapore'
val res6 = ctx.run(query[City].filter(_.name == "Singapore"))
                  ^
res6: List[City] = List(City(3208, "Singapore", "SGP", "\u0096", 4017733))

Find entries by id:

@ ctx.run(query[City].filter(_.id == 3208))
cmd13.sc:1: SELECT x1.id, x1.name, x1.countrycode, x1.district, x1.population FROM city x1 WHERE x1.id = 3208
val res13 = ctx.run(query[City].filter(_.id == 3208))
                   ^
res13: List[City] = List(City(3208, "Singapore", "SGP", "\u0096", 4017733))

Or to find all entries that match arbirary predicates, e.g. based on population below:

@ ctx.run(query[City].filter(_.population > 9000000))
cmd14.sc:1: SELECT x1.id, x1.name, x1.countrycode, x1.district, x1.population FROM city x1 WHERE x1.population > 9000000
val res14 = ctx.run(query[City].filter(_.population > 9000000))
                   ^
res14: List[City] = List(
  City(206, "S\u00e3o Paulo", "BRA", "S\u00e3o Paulo", 9968485),
  City(939, "Jakarta", "IDN", "Jakarta Raya", 9604900),
  City(1024, "Mumbai (Bombay)", "IND", "Maharashtra", 10500000),
  City(1890, "Shanghai", "CHN", "Shanghai", 9696300),
  City(2331, "Seoul", "KOR", "Seoul", 9981619),
  City(2822, "Karachi", "PAK", "Sindh", 9269265)
)

Predicates can have more than one clauses, e.g. here we filter on both population and countryCode:

@ ctx.run(query[City].filter(c => c.population > 5000000 && c.countryCode == "CHN"))
cmd15.sc:1: SELECT c.id, c.name, c.countrycode, c.district, c.population FROM city c WHERE c.population > 5000000 AND c.countrycode = 'CHN'
val res15 = ctx.run(query[City].filter(c => c.population > 5000000 && c.countryCode == "CHN"))
                   ^
res15: List[City] = List(
  City(1890, "Shanghai", "CHN", "Shanghai", 9696300),
  City(1891, "Peking", "CHN", "Peking", 7472000),
  City(1892, "Chongqing", "CHN", "Chongqing", 6351600),
  City(1893, "Tianjin", "CHN", "Tianjin", 5286800)
)

Or be broken up into more than one call to .filter:

@ ctx.run(query[City].filter(_.population > 5000000).filter(_.countryCode == "CHN"))
cmd16.sc:1: SELECT x1.id, x1.name, x1.countrycode, x1.district, x1.population FROM city x1 WHERE x1.population > 5000000 AND x1.countrycode = 'CHN'
val res16 = ctx.run(query[City].filter(_.population > 5000000).filter(_.countryCode == "CHN"))
                   ^
res16: List[City] = List(
  City(1890, "Shanghai", "CHN", "Shanghai", 9696300),
  City(1891, "Peking", "CHN", "Peking", 7472000),
  City(1892, "Chongqing", "CHN", "Chongqing", 6351600),
  City(1893, "Tianjin", "CHN", "Tianjin", 5286800)
)

If there are relevant table indices present, the WHERE clause generated by filters will make use of them to speed up the lookup. Otherwise it may end up doing a slow full table scan. A detailed discussion of table index performance is beyond the scope of this article.

Lifting

If you want to include dynamic values in your queries, e.g. filtering by a value that isn't a constant, you have to use the lift syntax:

@ def find(cityId: Int) = ctx.run(query[City].filter(_.id == lift(cityId)))
cmd22.sc:1: SELECT x1.id, x1.name, x1.countrycode, x1.district, x1.population FROM city x1 WHERE x1.id = ?
def find(cityId: Int) = ctx.run(query[City].filter(_.id == lift(cityId)))
                               ^
defined function find

@ find(3208)
res23: List[City] = List(City(3208, "Singapore", "SGP", "\u0096", 4017733))

@ find(3209)
res24: List[City] = List(City(3209, "Bratislava", "SVK", "Bratislava", 448292))

Notice how this query, with a variable lift(cityId) interpolated into it, is converted into a parametrized SQL query with a WHERE x1.id = ? clause. This avoids SQL injection vulnerabilities and improves database query optimization performance.

In general, anything within a ctx.run(...) clause is converted to SQL to run within your database, with the exception of the contents of lift(...) calls which run locally in your Scala program before being spliced into the query. That means that only a small set of operations are allowed on the converted-to-SQL code, with invalid operations giving you compile time errors:

@ ctx.run(query[City].filter(_.name.length == 1))
cmd25.sc:1: Tree 'x$1.name.length()' can't be parsed to 'Ast'
val res25 = ctx.run(query[City].filter(_.name.length == 1))
                                      ^
Compilation Failed

@ ctx.run(query[City].filter(_.name.substring(0, 1) == "S"))
cmd25.sc:1: Tree 'x$1.name.substring(0, 1)' can't be parsed to 'Ast'
val res25 = ctx.run(query[City].filter(_.name.substring(0, 1) == "S"))
                                      ^
Compilation Failed

Mapping

Often you do not need all the values in a particular table. For example, the country table has 15 different values per row, and if you are only interested in 2-3 of them, fetching them all is a waste of CPU time, memory, and network bandwidth. You can thus use .map to pick the columns that you are interested in:

@ ctx.run(query[Country].map(c => (c.name, c.continent)))
cmd26.sc:1: SELECT c.name, c.continent FROM country c
val res26 = ctx.run(query[Country].map(c => (c.name, c.continent)))
                   ^
res26: List[(String, String)] = List(
  ("Afghanistan", "Asia"),
  ("Netherlands", "Europe"),
  ("Netherlands Antilles", "North America"),
  ("Albania", "Europe"),
  ("Algeria", "Africa"),
...

@ ctx.run(query[Country].map(c => (c.name, c.continent, c.population)))
cmd27.sc:1: SELECT c.name, c.continent, c.population FROM country c
val res27 = ctx.run(query[Country].map(c => (c.name, c.continent, c.population)))
                   ^
res27: List[(String, String, Int)] = List(
  ("Afghanistan", "Asia", 22720000),
  ("Netherlands", "Europe", 15864000),
  ("Netherlands Antilles", "North America", 217000),
  ("Albania", "Europe", 3401200),
  ("Algeria", "Africa", 31471000),
...

You can combine the various operations in any order, e.g. here is a parametrized query that combines a filter and lift with a map to fetch the name of the city with a particular ID:

@ def findName(cityId: Int) = ctx.run(query[City].filter(_.id == lift(cityId)).map(_.name))
cmd32.sc:1: SELECT x1.name FROM city x1 WHERE x1.id = ?
def findName(cityId: Int) = ctx.run(query[City].filter(_.id == lift(cityId)).map(_.name))
                                   ^
defined function findName

@ findName(3208)
res33: List[String] = List("Singapore")

@ findName(3209)
res34: List[String] = List("Bratislava")

Joins

Joins allow you to make use of data split across multiple tables. For example, if I want to query "the name of every city in the continent of Asia", the city names are in the city table, but the continent name is in the country table. You can use joins to do so:

@ ctx.run(
    query[City]
      .join(query[Country])
      .on{case (city, country) => city.countryCode == country.code}
      .filter{case (city, country) => country.continent == "Asia"}
      .map{case (city, country) => city.name}
  )
cmd49.sc:1: SELECT x01.name FROM city x01 INNER JOIN country x11 ON x01.countrycode = x11.code WHERE x11.continent = 'Asia'
val res49 = ctx.run(
                   ^
res49: List[String] = List(
  "Kabul",
  "Qandahar",
  "Herat",
  "Mazar-e-Sharif",
  "Dubai",
  "Abu Dhabi",
  "Sharja",
  "al-Ayn",
  "Ajman",
  "Yerevan",
...

You can also join more than two tables, as long as there is some sort of key you can use to match the relevant rows in each table, similar to how above we are matching city.countryCode == country.code.

Actions

Inserts

You can use the .insert method to insert data into a database table:

@ ctx.run(query[City].insert(City(10000, "test", "TST", "Test County", 0)))
cmd36.sc:1: INSERT INTO city (id,name,countrycode,district,population) VALUES (10000, 'test', 'TST', 'Test County', 0)
val res36 = ctx.run(query[City].insert(City(10000, "test", "TST", "Test County", 0)))
                   ^
res36: Long = 1L

@ ctx.run(query[City].filter(_.population == 0))
cmd37.sc:1: SELECT x1.id, x1.name, x1.countrycode, x1.district, x1.population FROM city x1 WHERE x1.population = 0
val res37 = ctx.run(query[City].filter(_.population == 0))
                   ^
res37: List[City] = List(City(10000, "test", "TST", "Test County", 0))

There is also a batch insertion syntax, using liftQuery and foreach:

@ ctx.run(
    liftQuery(List(
      City(10001, "testville", "TSV", "Test County", 0)  ,
      City(10002, "testopolis", "TSO", "Test County", 0),
      City(10003, "testberg", "TSB", "Test County", 0)
    )).foreach(e => query[City].insert(e))
  )
cmd38.sc:1: INSERT INTO city (id,name,countrycode,district,population) VALUES (?, ?, ?, ?, ?)
val res38 = ctx.run(
                   ^
res38: List[Long] = List(1L, 1L, 1L)

@ ctx.run(query[City].filter(_.population == 0))
cmd39.sc:1: SELECT x1.id, x1.name, x1.countrycode, x1.district, x1.population FROM city x1 WHERE x1.population = 0
val res39 = ctx.run(query[City].filter(_.population == 0))
                   ^
res39: List[City] = List(
  City(10000, "test", "TST", "Test County", 0),
  City(10001, "testville", "TSV", "Test County", 0),
  City(10002, "testopolis", "TSO", "Test County", 0),
  City(10003, "testberg", "TSB", "Test County", 0)
)

Updates

You can use .update to replace an entire row with a new one:

@ ctx.run(query[City].filter(_.id == 10000).update(City(10000, "testham", "TST", "Test County", 0)))
cmd40.sc:1: UPDATE city SET id = 10000, name = 'testham', countrycode = 'TST', district = 'Test County', population = 0 WHERE id = 10000
val res40 = ctx.run(query[City].filter(_.id == 10000).update(City(10000, "testham", "TST", "Test County", 0)))
                   ^
res40: Long = 1L

@ ctx.run(query[City].filter(_.id == 10000))
cmd41.sc:1: SELECT x1.id, x1.name, x1.countrycode, x1.district, x1.population FROM city x1 WHERE x1.id = 10000
val res41 = ctx.run(query[City].filter(_.id == 10000))
                   ^
res41: List[City] = List(City(10000, "testham", "TST", "Test County", 0))

Update individual values within the row:

@ ctx.run(query[City].filter(_.id == 10000).update(_.name -> "testford"))
cmd42.sc:1: UPDATE city SET name = 'testford' WHERE id = 10000
val res42 = ctx.run(query[City].filter(_.id == 10000).update(_.name -> "testford"))
                   ^
res42: Long = 1L

@ ctx.run(query[City].filter(_.id == 10000))
cmd43.sc:1: SELECT x1.id, x1.name, x1.countrycode, x1.district, x1.population FROM city x1 WHERE x1.id = 10000
val res43 = ctx.run(query[City].filter(_.id == 10000))
                   ^
res43: List[City] = List(City(10000, "testford", "TST", "Test County", 0))

Or update multiple rows at once:

@ ctx.run(query[City].filter(_.district == "Test County").update(_.district -> "Test Borough"))
cmd44.sc:1: UPDATE city SET district = 'Test Borough' WHERE district = 'Test County'
val res44 = ctx.run(query[City].filter(_.district == "Test County").update(_.district -> "Test Borough"))
                   ^
res44: Long = 4L

@ ctx.run(query[City].filter(_.population == 0))
cmd45.sc:1: SELECT x1.id, x1.name, x1.countrycode, x1.district, x1.population FROM city x1 WHERE x1.population = 0
val res45 = ctx.run(query[City].filter(_.population == 0))
                   ^
res45: List[City] = List(
  City(10001, "testville", "TSV", "Test Borough", 0),
  City(10002, "testopolis", "TSO", "Test Borough", 0),
  City(10003, "testberg", "TSB", "Test Borough", 0),
  City(10000, "testford", "TST", "Test Borough", 0)
)

Transactions

One of the primary features of a database is transactionality: the ability to start a transaction, perform some queries and updates in your isolated from any changes others may be making, and then either committing your changes atomically or rolling them back at the end of the transaction. Quill supports this via the ctx.transaction{...} syntax:

@ ctx.transaction{
    ...
  }

Any updates within the transaction are only committed when the transaction completes: any other processes querying the database will not see any half-baked changes. Furthermore, if the transaction fails with an exception, the changes are never committed:

@ ctx.transaction{
    ctx.run(query[City].filter(_.district == "Test Borough").update(_.district -> "Test County"))
    throw new Exception()
  }
cmd46.sc:2: UPDATE city SET district = 'Test County' WHERE district = 'Test Borough'
  ctx.run(query[City].filter(_.district == "Test Borough").update(_.district -> "Test County"))
         ^
java.lang.Exception
  ammonite.$sess.cmd46$.$anonfun$res46$1(cmd46.sc:3)
  io.getquill.context.jdbc.JdbcContext.$anonfun$transaction$2(JdbcContext.scala:81)
  scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
  io.getquill.context.jdbc.JdbcContext.$anonfun$transaction$1(JdbcContext.scala:77)
  io.getquill.context.jdbc.JdbcContext.$anonfun$withConnection$1(JdbcContext.scala:61)
  scala.Option.getOrElse(Option.scala:189)
  io.getquill.context.jdbc.JdbcContext.withConnection(JdbcContext.scala:59)
  io.getquill.context.jdbc.JdbcContext.transaction(JdbcContext.scala:76)
  ammonite.$sess.cmd46$.<init>(cmd46.sc:1)
  ammonite.$sess.cmd46$.<clinit>(cmd46.sc)


@ ctx.run(query[City].filter(_.population == 0))
cmd47.sc:1: SELECT x1.id, x1.name, x1.countrycode, x1.district, x1.population FROM city x1 WHERE x1.population = 0
val res47 = ctx.run(query[City].filter(_.population == 0))
                   ^
res47: List[City] = List(
  City(10001, "testville", "TSV", "Test Borough", 0),
  City(10002, "testopolis", "TSO", "Test Borough", 0),
  City(10003, "testberg", "TSB", "Test Borough", 0),
  City(10000, "testford", "TST", "Test Borough", 0)
)

As you can see, even though the update call completed, the exception caused the transaction to abort, and thus the "Test Borough" columns in the City table were never updated. This applies to both exceptions that happen accidentally, in the process of executing your code, and also to exceptions you throw yourself e.g. if you want to intentionally abort a transaction and discard the changes.

Transactions are a very useful tool to maintain the data integrity of your database.

Note that it is also possible for a transaction to fail due to a conflict: e.g. if two concurrent transactions are reading and writing the same row at the same time. In such a case, the first transaction to complete wins, and the later transaction aborts and discards its changes. For more details on how Postgres transactions work and how they can be configured, check out the Postgres documentation on Transaction Isolation.

Updating our Chat Website

In an earlier tutorial, we built a simple chat website: users could enter messages in a chat room, where they were made available for other users to see:

The full code for that chat website is as follows:

package app
import scalatags.Text.all._
object MinimalApplication extends cask.MainRoutes{
  var messages = Vector(
    ("alice", "Hello World!"),
    ("bob", "I am cow, hear me moo"),
    ("charlie", "I weigh twice as you"),
  )

  var openConnections = Set.empty[cask.WsChannelActor]

  def messageList() = frag(for((name, msg) <- messages) yield p(b(name), " ", msg))

  @cask.postJson("/")
  def postHello(name: String, msg: String) = {
    if (name == "") ujson.Obj("success" -> false, "txt" -> "Name cannot be empty")
    else if (name.length >= 10) ujson.Obj("success" -> false, "txt" -> "Name cannot be longer than 10 characters")
    else if (msg == "") ujson.Obj("success" -> false, "txt" -> "Message cannot be empty")
    else if (msg.length >= 160) ujson.Obj("success" -> false, "txt" -> "Message cannot be longer than 160 characters")
    else {
      messages = messages :+ (name -> msg)
      val notification = cask.Ws.Text(
        ujson.Obj("index" -> messages.length, "txt" -> messageList().render).render()
      )
      for(conn <- openConnections) conn.send(notification)
      openConnections = Set.empty
      ujson.Obj("success" -> true, "txt" -> messageList().render)
    }
  }

  @cask.get("/")
  def hello() = {
    html(
      head(
        link(
          rel := "stylesheet",
          href := "https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css"
        ),
        script(raw("""
          function submitForm(){
            fetch(
              "/",
              {method: "POST", body: JSON.stringify({name: nameInput.value, msg: msgInput.value})}
            ).then(response => response.json())
             .then(json => {
              if (json.success) {
                messageList.innerHTML = json.txt
                msgInput.value = ""
                errorDiv.innerText = ""
              } else {
                errorDiv.innerText = json.txt
              }
            })
          }

          var socket = new WebSocket("ws://" + location.host + "/subscribe");
          socket.onopen = function(ev){ socket.send("0") }
          socket.onmessage = function(ev){
            var json = JSON.parse(ev.data)
            messageList.innerHTML = json.txt
            socket.send("" + json.index)
          }
        """))
      ),
      body(
        div(cls := "container")(
          h1("Scala Chat!"),
          hr,
          div(id := "messageList")(
            messageList()
          ),
          hr,
          div(id := "errorDiv", color.red),
          form(onsubmit := "submitForm(); return false")(
            input(
              `type` := "text",
              id := "nameInput",
              placeholder := "User name",
              width := "20%"
            ),
            input(
              `type` := "text",
              id := "msgInput",
              placeholder := "Please write a message!",
              width := "60%"
            ),
            input(`type` := "submit", width := "20%")
          )
        )
      )
    ).render
  }

  @cask.websocket("/subscribe")
  def subscribe() = cask.WsHandler { connection =>
    cask.WsActor {
      case cask.Ws.Text(msg) =>
        if (msg.toInt < messages.length){
          connection.send(
            cask.Ws.Text(ujson.Obj("index" -> messages.length, "txt" -> messageList().render).render())
          )
        }else openConnections += connection
      case cask.Ws.Close(_, _) => openConnections -= connection
    }
  }

  initialize()
}

As implemented above, this webserver stores the messages as an in-memory Vector[Message]. While this is convenient, the weakness of such a setup is that if the chat server goes down - whether due to failure or updates - all messages are lost. The obvious solution would be to store the chat messages in a database that would persist the data regardless of whether the chat web server is running or not.

To make our website use Quill and Postgres, we first need to add the same libraries we used above to the build.sc file's def ivyDeps:

   def ivyDeps = Agg(
+    ivy"io.getquill::quill-jdbc:3.4.10",
+    ivy"org.postgresql:postgresql:42.2.8",
+    ivy"com.opentable.components:otj-pg-embedded:0.13.1",
     ivy"com.lihaoyi::scalatags:0.7.0",   
     ivy"com.lihaoyi::cask:0.3.0"
   )

Next, we need to replace the in-memory messages store with a database connection:


 object MinimalApplication extends cask.MainRoutes{
-   var messages = Vector(
-     ("alice", "Hello World!"),
-     ("bob", "I am cow, hear me moo"), 
-     ("charlie", "I weigh twice as you"),
-   )
+  case class Message(name: String, msg: String)
+  import com.opentable.db.postgres.embedded.EmbeddedPostgres
+  val server = EmbeddedPostgres.builder()
+    .setDataDirectory("data")
+    .setCleanDataDirectory(false)
+    .setPort(5432)
+    .start()
+  import io.getquill._
+  import com.zaxxer.hikari.{HikariConfig, HikariDataSource}
+  val pgDataSource = new org.postgresql.ds.PGSimpleDataSource()
+  pgDataSource.setUser("postgres")
+  val config = new HikariConfig()
+  config.setDataSource(pgDataSource)
+  val ctx = new PostgresJdbcContext(LowerCase, new HikariDataSource(config))
+  ctx.executeAction("""
+    CREATE TABLE IF NOT EXISTS message (
+      name text,
+      msg text
+    ); 
+  """)
+  import ctx._
+  def messages = ctx.run(query[Message].map(m => (m.name, m.msg)))
  
   var openConnections = Set.empty[cask.WsChannelActor]

Now, every call to messages will run a query on the Postgres database rather than read from the in-memory Vector. This ensures that not only do messages persist if the application process goes down, it means other application processes can access from the same database and read the same messages.

For now, we are just spawning the Postgres database in memory, using com.opentable.db.postgres.embedded.EmbeddedPostgres. For a real deployment you usually want a database deployed and managed separately from your application code. Note that although we are still using a local database, we did a setCleanDataDirectory(false) to ensure that the actual data being stored in the database persists between process and database restarts.

Next, we need to update the place we store the messages to instead write them to the database:

     else {
-      messages = messages :+ (name -> msg)
+      ctx.run(query[Message].insert(lift(Message(name, msg))))
       val notification = cask.Ws.Text(

And that's all we need to turn our in-memory chat website into a simple database-backed web service.

The complete code for MinimalApplication.scala now looks like this:

package app
import scalatags.Text.all._
object MinimalApplication extends cask.MainRoutes{
  case class Message(name: String, msg: String)
  import com.opentable.db.postgres.embedded.EmbeddedPostgres
  val server = EmbeddedPostgres.builder()
    .setDataDirectory("data")
    .setCleanDataDirectory(false)
    .setPort(5432)
    .start()
  import io.getquill._
  import com.zaxxer.hikari.{HikariConfig, HikariDataSource}
  val pgDataSource = new org.postgresql.ds.PGSimpleDataSource()
  pgDataSource.setUser("postgres")
  val config = new HikariConfig()
  config.setDataSource(pgDataSource)
  val ctx = new PostgresJdbcContext(LowerCase, new HikariDataSource(config))
  ctx.executeAction("""
    CREATE TABLE IF NOT EXISTS message (
      name text,
      msg text
    );
  """)
  import ctx._
  def messages = ctx.run(query[Message].map(m => (m.name, m.msg)))

  var openConnections = Set.empty[cask.WsChannelActor]

  @cask.get("/")
  def hello() = {
    html(
      head(
        link(
          rel := "stylesheet",
          href := "https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css"
        ),
        script(raw("""
          function submitForm(){
            fetch(
              "/",
              {method: "POST", body: JSON.stringify({name: nameInput.value, msg: msgInput.value})}
            ).then(response => response.json())
             .then(json => {
              if (json.success) {
                messageList.innerHTML = json.txt
                msgInput.value = ""
                errorDiv.innerText = ""
              } else {
                errorDiv.innerText = json.txt
              }
            })
          }
          var socket = new WebSocket("ws://" + location.host + "/subscribe");
          var eventIndex = 0
          socket.onopen = function(ev){ socket.send("" + eventIndex) }
          socket.onmessage = function(ev){
            var json = JSON.parse(ev.data)
            eventIndex = json.index
            socket.send("" + eventIndex)
            messageList.innerHTML = json.txt
          }
        """))
      ),
      body(
        div(cls := "container")(
          h1("Scala Chat!"),
          hr,
          div(id := "messageList")(
            messageList()
          ),
          hr,
          div(id := "errorDiv", color.red),
          form(onsubmit := "submitForm(); return false")(
            input(
              `type` := "text",
              id := "nameInput",
              placeholder := "User name",
              width := "20%"
            ),
            input(
              `type` := "text",
              id := "msgInput",
              placeholder := "Please write a message!",
              width := "60%"
            ),
            input(`type` := "submit", width := "20%")
          )
        )
      )
    ).render
  }

  def messageList() = frag(for((name, msg) <- messages) yield p(b(name), " ", msg))

  @cask.postJson("/")
  def postHello(name: String, msg: String) = {
    if (name == "") ujson.Obj("success" -> false, "txt" -> "Name cannot be empty")
    else if (name.length >= 10) ujson.Obj("success" -> false, "txt" -> "Name cannot be longer than 10 characters")
    else if (msg == "") ujson.Obj("success" -> false, "txt" -> "Message cannot be empty")
    else if (msg.length >= 160) ujson.Obj("success" -> false, "txt" -> "Message cannot be longer than 160 characters")
    else {
      ctx.run(query[Message].insert(lift(Message(name, msg))))
      val notification = cask.Ws.Text(
        ujson.Obj("index" -> messages.length, "txt" -> messageList().render).render()
      )
      for(conn <- openConnections) conn.send(notification)
      openConnections = Set.empty
      ujson.Obj("success" -> true, "txt" -> messageList().render)
    }
  }

  @cask.websocket("/subscribe")
  def subscribe() = cask.WsHandler { connection =>
    cask.WsActor {
      case cask.Ws.Text(msg) =>
        if (msg.toInt < messages.length){
          connection.send(
            cask.Ws.Text(ujson.Obj("index" -> messages.length, "txt" -> messageList().render).render())
          )
        }else openConnections += connection
      case cask.Ws.Close(_, _) => openConnections -= connection
    }
  }

  initialize()
}

Now you can try posting messages to the chat website, restarting the application process, and see that the messages persist across the restart.

Conclusion

In this tutorial, we have walked through how to work with a simple postgres database from Scala, using the Quill query library. Starting in the REPL, we seeded our database with a simple world.sql set of sample data, defined the mapping case classs, and explored using Quill to run queries which filtered, mapped, joined, inserted, and updated the data in our postgres database. Lastly, we saw how to wire up Quill into our simple chat webserver built in an earlier tutorial, giving our website the ability to persist data across process restarts.


About the Author: Haoyi is a software engineer, and the author of many open-source Scala tools such as the Ammonite REPL and the Mill Build Tool. If you enjoyed the contents on this blog, you may also enjoy Haoyi's book Hands-on Scala Programming


Scraping Websites using Scala and JsoupHow an Optimizing Compiler Works

Updated 2019-10-09