The web world has seen an incredible evolution of technologies over the last decade. From pages rendered on the server, to dynamically on the client, to back on the server.

Something that has not experienced the same rate of evolution is the relational database. To be clear: databases have evolved, but the interface has stayed largely the same. Most of the SQL you wrote 15 years ago will still work fine today. It is unlikely you can say the same about your web application.

In this article I’d like to show you the concept of database versioning and migrations. Specifically, the evolution of your database schema.

The target audience is anyone who works with relational databases. We will discuss the following two concepts:

  1. Database (Schema) Versioning
  2. Database (SQL) Migrations

By the end you will have learned how to version your database and how to write and apply SQL migrations.

Prerequisites

This article assumes you have a basic understanding of SQL and relational databases. You won't need to be able to implement a B-Tree in C, but CREATE TABLE should not look scary to you.

Docker should be installed on your system. This makes it easy to run various database drivers (e.g. Postgres, MySQL) and the migration framework. If you absolutely don't want to use Docker, having the database driver of your choice and the JDK installed will also work.

We will be using Flyway, which is written in Java, but you don't need to install or know any Java. A Docker container is all we need. This is why the title has "for everyone". It doesn't matter if you have a Node.js, Python or Go application. We are just concerned with the database.

What is Database Versioning?

You are probably already familiar with version control for source code. Every time you commit code, your project's HEAD version changes. You can easily see what version you are on currently, and pull the latest changes with a simple command (e.g. git pull).

Why don't we have the same for databases? Are the development and production environment running the same database schemas? When a new engineer joins and receives a private database instance, will it be the same as all other instances?

This is what database versioning is about. In addition to checking .SQL files into version control, we will also run a command that stores the latest version of the database in the database itself.

Versioning comes with applying migrations. Whenever you apply a new migration, you are creating a new database version.

What are Database Migrations?

The term “Database Migration” is somewhat confusing. We are not migrating from one database driver to another, say MySQL to Postgres, or Postgres to DynamoDB.

When we say "migration", we really mean migrating the database from one schema to the next. Or one "version" to the next. As codebase grows and new functionality is added, your database needs to change accordingly.

Imagine we have a database that holds books. Now we want to add authors to books. This is what we can achieve with a migration. We migrate the database from a schema that just has books, to a schema that has books and authors.

Solidify your knowledge

This is totally optional, but in order to see if you get the material and are not just passively reading, answer the following questions in your own head:

What is the difference between database versioning and database migrations?

Getting Started

In this section we'll set up everything we need for running migrations. We will set up a fresh Postgres database and the Flyway Docker container. This should take less than a minute.

Later in the article after we understand migrations, we will explore how to implement versioning and migrations on an existing database. So you can apply migrations on your next project or immediately start implementing them on your current projects.

Why Docker?

Flyway runs on the JVM. When I was mainly developing web appplications in JavaScript and Python, as soon as I saw Java or the JVM was involved I would move on. But wait! While Flyway is written in Java, you will not have to write any or understand a single line of Java for this to work. You don’t even need to install the JDK.

Why Flyway Over Other Existing Solutions?

Database migrations are not an unexplored landscape. Perhaps somewhat under-explored, though. Many other solutions are tied to a specific framework. I am personally a big fan of Django, but Django migrations don't work well outside of a Django application. The same could be said for Ruby on Rails for Ruby and TypeORM for TypeScript. You usually need to declare all your models in a specific format before being able to use migrations.

There are other tools but Flyway is unique in a few ways. First, it was created in 2010 and has been battle tested for over 14 years. That's 14 years of bug fixes and improvements. Something that is hard to compete with for these newer tools. If you want to be impressed, take a look at their Release Notes page, where you can scroll all the way back to 2010. That's a lot of learning and improving.

It also supports over 30 different database drivers at the time of writing. Not just your standard MySQL, Postgres and SQLite.

Last but not least: Migrations are just pure SQL. Whether you have a Django application, a Ruby on Rails application, a Go microservice or Spring Boot services, you can manage all of their database schemas using the same SQL.

Note: I am not affiliated with Red Gate or Flyway in any way. I am just passionate about software that works wonderfully.

Creating a Database

Depending on your background, you might already have various database instances running. Or maybe you're just getting started and somehow stumbled on this article when looking at ways to version your database.

In this section we will set up a Postgres database using Docker. If this is something you can do with your eyes closed, feel free to move on to Adding Flyway.

We'll start by creating a docker-compose.yml file in the directory of our choice.

services:
  postgres:
    image: postgres:16-alpine
    ports:
      - "5432:5432"
    environment:
      - POSTGRES_DB=books
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
    volumes:
      - ./data:/var/lib/postgresql/data
  1. We use postgres:16-alpine. Alpine is a smaller size base image of Linux.
  2. We expose port 5432 from the container to our host machine so that we can connect to it.
  3. We create a books database and access it using "`postgres" as both a username and password.
  4. We persist data on our file system in ./data, this folder is relative to our docker-compose.yml.

When running docker compose up it will start your Postgres database container!

Note: Maybe you have ran Postgres on your machine in the past. In that case you can get a “port in use” error, run lsof -i :5432 to see what process is taking up that port. You can then do kill -9 [pid] to shut it down. Do not do this on any environment other than your local one.

Our database does not have any tables or data yet. This is on purpose.

Adding Flyway

Now that we have our database up and running, it is time to add the flyway container.

services:
  postgres:
    ...
    
  flyway:
    depends_on:
      - postgres
    image: flyway/flyway:10
    volumes:
      - ./flyway/conf:/flyway/conf
      - ./flyway/migrations:/flyway/migrations

We added a new container and mounted two volumes:

  1. flyway/conf - the configuration file(s)
  2. flyway/migrations - the migration files

The flyway folder currently does not exist so we have to create it. Create a folder named flyway relative from your docker-compose file with 2 folders: conf and migrations. Inside conf create a new file named “flyway.toml”.

flyway
├── conf
│   └── flyway.toml
└── migrations

Almost done! We create the configuration file and we should be able to run flyway.

[flyway]
environment = "local"
locations = ["filesystem:/flyway/migrations"]

[environments.local]
url = "jdbc:postgresql://postgres:5432/books"
user = "postgres"
password = "postgres"

On production systems, you would not use plain text user/password. Instead, you should use environment variables.

Before we move on, let me explain the url you see here.

  1. jdbc: is a prefix for the Java Database Connectivity protocol. Used by Flyway to start a new database driver connection.
  2. postgresql:// is the environment agnostic address for postgresql. This is the same you’d use in your JavaScript applications using pg. If you use MySQL, this would by mysql://.
  3. postgres:5432 is the docker address. If you were running outside of Docker, it would be localhost:5432 or your production database URL.
    1. /books is our database name.

Let’s try it out.

$ docker compose run flyway info
[+] Creating 1/0
 ✔ Container flyway-postgres-1  Running                                                                                                                                      0.0s 
Flyway OSS Edition 10.15.0 by Redgate

See release notes here: https://rd.gt/416ObMi
Database: jdbc:postgresql://postgres:5432/books (PostgreSQL 16.3)
Schema history table "public"."flyway_schema_history" does not exist yet
Schema version: << Empty Schema >>

+----------+---------+-------------+------+--------------+-------+----------+
| Category | Version | Description | Type | Installed On | State | Undoable |
+----------+---------+-------------+------+--------------+-------+----------+
| No migrations found                                                       |
+----------+---------+-------------+------+--------------+-------+----------+

Success! We have ran Flyway and seen that there are no migrations in our empty database. We are now ready to move on.

Note: If you got a system architecture error, make sure to use the right platform variable. On my Apple Silicon MacBook I have to add platform: linux/arm64/v8 under image: flyway/flyway:10

Migration Formats

Flyway migrations have a specific format, depending on whether they are versioned, repeatable or “undo” migrations.

Versioned

The real meat and potatoes of Flyway. Migrations that are applied in order.

  1. V1__Add_book_table.sql
  2. V2__Add_author_table.sql
  3. V3__Add_isbn_column_to_book_because_we_forgot.sql

This is what makes your database versioned. Every new migration you create, adds a new version.

Repeatable

Repeatable migrations will run every time the content changes. This is useful for database VIEWs and reference tables. If you change any of the contents of a versioned migration, Flyway will throw an error. If you change the contents of a repeatable migration, it will simply run again.

  1. R__Popular_book_view.sql

Undo

A migration that is the opposite of your versioned migration. This is almost always a bad idea. Undo is not always possible if newer data depends on it. It's better to create another versioned migration that "undoes" what you want it to do instead of creating an undo migration.

  1. U__Bad_idea.sql

Creating Migrations

We’ll concern us with versioned migrations. As hinted to earlier, versioned migrations take the format of V[version]__[description].sql.

I have been using a bash alias that you can use to generate migration files. You can add the following to your .bashrc or .zshrc:

alias fly='f() { fname="V$(date +%Y%m%d%H%M%S)__$(echo "$1" | tr " " "_").sql"; touch "$fname" && echo "Created: $fname"; }; f'

This uses the current time, followed by 2 underscores, following by the text you passed as an alphanumeric file name. Now you can create a migration by calling fly from your terminal.

Fire up the terminal and open the ./flyway/migrations folder, then run fly:

$ fly "Create books table"
Created: V20240617233452__Create_books_table.sql

The migration is empty now. Add our CREATE TABLE statement to it.

CREATE TABLE books (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    author VARCHAR(255) NOT NULL,
    isbn VARCHAR(13) NOT NULL
);

After adding the SQL run the info command one more time:

$ docker compose run flyway info
[+] Creating 1/0
 ✔ Container flyway-postgres-1  Running                                                                                                                                      0.0s 
Flyway OSS Edition 10.15.0 by Redgate

See release notes here: https://rd.gt/416ObMi
Database: jdbc:postgresql://postgres:5432/books (PostgreSQL 16.3)
Schema history table "public"."flyway_schema_history" does not exist yet
Schema version: << Empty Schema >>

+-----------+----------------+--------------------+------+--------------+---------+----------+
| Category  | Version        | Description        | Type | Installed On | State   | Undoable |
+-----------+----------------+--------------------+------+--------------+---------+----------+
| Versioned | 20240617233452 | Create books table | SQL  |              | Pending | No       |
+-----------+----------------+--------------------+------+--------------+---------+----------+

Terrific. It picked up our migration “Create books table” with the correct version number. Notice that it has a type of “Pending”. This means that Flyway knows the migration has not been applied yet.

Applying Versioned Migrations

It is time to apply our first migration. I hope you’re excited.

$ docker compose run flyway migrate

[+] Creating 1/0
 ✔ Container flyway-postgres-1  Running                                                                                                                                      0.0s 
Flyway OSS Edition 10.15.0 by Redgate

See release notes here: https://rd.gt/416ObMi
Database: jdbc:postgresql://postgres:5432/books (PostgreSQL 16.3)
Schema history table "public"."flyway_schema_history" does not exist yet
Successfully validated 1 migration (execution time 00:00.009s)
Creating Schema History table "public"."flyway_schema_history" ...
Current version of schema "public": << Empty Schema >>
Migrating schema "public" to version "20240617233452 - Create books table"
Successfully applied 1 migration to schema "public", now at version v20240617233452 (execution time 00:00.013s)

Done no time! We now have our first migration applied through Flyway. If we try to run the migrate command again, we will see that our database is now “versioned”:

$ docker compose run flyway migrate
Database: jdbc:postgresql://postgres:5432/books (PostgreSQL 16.3)
Successfully validated 1 migration (execution time 00:00.025s)
Current version of schema "public": 20240617233452
Schema "public" is up to date. No migration necessary.

Hopefully you can see how great this is. Version control for our database! We check out migrations into version control, like we would with code.

Finally we can complete the example and add our authors and joining table.

$ fly "Add authors and join table" 
Created: V20240618100329__Add_authors_and_join_table.sql
CREATE TABLE author (
    id SERIAL PRIMARY KEY,
    first_name VARCHAR(255) NOT NULL,
    last_name VARCHAR(255) NOT NULL
);

CREATE TABLE book_author (
    book_id INT NOT NULL,
    author_id INT NOT NULL,
    PRIMARY KEY (book_id, author_id),
    FOREIGN KEY (book_id) REFERENCES books (id),
    FOREIGN KEY (author_id) REFERENCES author (id)
);

Run migrations and done!.

The amazing thing about these migrations being stored in your database schema, is that you can apply them to a fresh database whenever you want.

A new developer setting up a local environment? They run one migrate command and have the latest database schema from master.

A production database? A new migration is pushed to master and automatically applied (assuming thorough code review and passing CI, of course).

You can even have multiple database environments. If you want to know whether your database is “up to date”, you can compare the latest migration version file, with the latest version in the migrations table (by just running flyway info)!

Repeatable Migrations

Most of the time you will use versioned migrations.
Aside from versioned migrations, we have repeatable migrations. They do not have a version and start with R, instead of V. For example: R__Best_seller_view.sql.

When creating database views or functions, you want to recreate them every time something changes. This is exactly what a repeatable migration does.

  1. We create a R__[description].sql migration
  2. We run migrate
  3. Flyway sees a new repeatable migration and applies it, saving the checksum in its version table.
  4. We run migrate again, nothing happens
  5. We update the repeatable migration content, by adding a column to our view or parameter to our function
  6. We run migrate again, Flyway sees that the checksum has changes and reruns the migration SQL.

This is why repeatable migrations need to be idempotent. Let's create a repeatable migration that holds our book genres.

$ touch R__Add_book_genres.sql

It's important to note that repeated migrations do not have a version attached to them. This is to make it clear that they are always ran, and not just applied once.

CREATE TABLE IF NOT EXISTS book_genres
(
    id   SERIAL PRIMARY KEY,
    name VARCHAR(50) UNIQUE NOT NULL
);

INSERT INTO book_genres (name)
VALUES ('Fiction'),
       ('Non-fiction')
ON CONFLICT (name) DO NOTHING;

This adds a new table and 2 genres, fiction and non-fiction. When we run the migration, we can see a new table is created and populated:

If we run the migrations again, nothing happens:

Successfully validated 3 migrations (execution time 00:00.024s)
Current version of schema "public": 20240926210803
Schema "public" is up to date. No migration necessary.

But if we add update our R__Add_book_genres file with a new genre to our book_genres table:

INSERT INTO book_genres (name)
VALUES ('Fiction'),
       ('Non-fiction'),
       ('History') -- new 
ON CONFLICT (name) DO NOTHING;
$ docker compose run flyway migrate
...
Successfully validated 4 migrations (execution time 00:00.025s)
Current version of schema "public": 20240926210803
Migrating schema "public" with repeatable migration "Add book genres"
WARNING: DB: relation "book_genres" already exists, skipping (SQL State: 42P07 - Error Code: 0)
Successfully applied 1 migration to schema "public" (execution time 00:00.009s)

We now have a new entry:

Using an Existing Database

So far we have learned how to create both versioned and repeatable migrations. Starting from a new database is great for learning purposes, but there are many existing databases out there that are not versioned. Can we still use migrations?

The answer is yes! The mistakes of the past do not have to haunt us today. Adding versioning to an existing database is an excellent idea.

To demonstrate, we are going to delete all our migration files. All our tables are still there: book, author, book_author and book genres. We will drop the migrations table too, to make it a realistic example.

$ rm flyway/migrations/*.sql
DROP TABLE flyway_schema_history;

Clean slate! Let’s inspect using flyway info.

$ docker compose run flyway info 
Database: jdbc:postgresql://postgres:5432/books (PostgreSQL 16.3)
Schema version: << Empty Schema >>

+----------+---------+-------------+------+--------------+-------+----------+
| Category | Version | Description | Type | Installed On | State | Undoable |
+----------+---------+-------------+------+--------------+-------+----------+
| No migrations found                                                       |
+----------+---------+-------------+------+--------------+-------+----------+

Alright, looks good! Time to run migrate to install Flyway:

$ docker compose run flyway migrate
Database: jdbc:postgresql://postgres:5432/books (PostgreSQL 16.3)
Schema history table "public"."flyway_schema_history" does not exist yet
ERROR: Found non-empty schema(s) "public" but no schema history table. Use baseline() or set baselineOnMigrate to true to initialize the schema history table.

We can’t run migrate on an existing database. As the error indicates, we need to set a baseline. The baseline is your starting point. If you have an existing database you need to tell Flyway: "Hey, we start from here."

The base migration is a versioned migration file that contains your entire database schema. In Postgres, we can use pgdump to export our database. If you use an IDE or a datatbase management GUI, it usually has an option to export your database.

We have been using the command line so far, so let's keep doing that. We'll create our first migration file called "initial database". Because it uses the date/time as the version, every migration you create after this one will be applied in the correct order.

$ fly "Initial database"
Created: V20240927091411__Initial_database.sql
$ pg_dump -U postgres -h localhost books > V20240927091411__Initial_database.sql

We use output redirection (>) to write the output of pg_dump into our newly created migration. Remember that the number after V is our ID. We have just created a migration with ID 20240927091411. We can now set our baseline:

$ docker compose run flyway baseline -baselineVersion=20240927091411
...
Creating Schema History table "public"."flyway_schema_history" with baseline
Successfully baselined schema with version: 20240927091411

When checking our schema now, we’ll have a baseline:

$ docker compose run flyway info
Schema version: 20240927091411

+----------+----------------+-----------------------+----------+---------------------+----------+----------+
| Category | Version        | Description           | Type     | Installed On        | State    | Undoable |
+----------+----------------+-----------------------+----------+---------------------+----------+----------+
|          | 20240927091411 | << Flyway Baseline >> | BASELINE | 2024-09-27 14:20:42 | Baseline | No       |
+----------+----------------+-----------------------+----------+---------------------+----------+----------+

Baseline created. we can now follow the steps from Creating Migrations and Applying Versioned Migrations to keep your database versioned!

Let's create one more migration for fun. Just to see everything works as expected. Our book table does not have a publish_date yet, so let's fix that.

$ fly "Add publish date column to book"
Created: V20240927095941__add_publish_date_column_to_book.sql

Add the column to our table with a simple SQL statement:

ALTER TABLE books
    ADD COLUMN publish_date DATE;

Apply it!

$ docker compose run flyway migrate
...
Current version of schema "public": 20240927091411
Migrating schema "public" to version "20240927095941 - Add publish date column to book"
Successfully applied 1 migration to schema "public", now at version v20240927095941 (execution time 00:00.007s)

Magic:

Done!

Hopefully you see the power of database versioning and migrations. Back in the day we would send our .SQL files to a database team, who would then manually run it against the database. That seems so inefficient and error prone now.