Learn Why and How to Use Relational Database Migrations

When developing backend services, it’s effortless to create problems if database integration is implemented incorrectly. This article will tell you some best practices for working with relational databases in modern services and also will show you that automatically generating and keeping up-to-date schema is arguably not a good idea.

I will use Flyway for database migrations, Spring Boot for easy setup, and H2 as an example database.

I didn’t cover basic information about what migrations are and how they work. Here’re good articles from Flyway:

basic information about what migrations are
how Flyway works under the hood.

The problem

A long time ago, developers were initializing and updating databases by applying scripts separately from the application. However, nobody does it these days because it’s hard to develop and maintain in a proper state, which leads to severe troubles.

Nowadays, developers mostly use two approaches:

Automatic generation, e.g., JPA or Hibernate - database initializes and keeps up to date by comparing classes and the current DB state; if changes are needed, they apply.
Database migrations - developers incrementally update the database, and changes apply automatically on a startup, database migrations.

Also, if we talk about Spring, there’s a basic database initialization out of the box, but it’s way less advanced than its analogs such as Flyway or Liquibase.

Hibernate automatical generating

To demonstrate how it works let’s use a simple example. Table users with three fields - id, user_name, email:

Let’s have a look at the one automatically generated by Hibernate.

Hibernate entity:

@Entity
@Table(name = "users")
public class User {
    @Id
    @GeneratedValue
    private UUID id;

    @Column(name = "user_name", length = 64, nullable = false)
    private String userName;

    @Column(name = "email", length = 128, nullable = true)
    private String email;
}

To enable keeping the schema up to date we need this row in Spring Boot config and it starts doing it on startup:

jpa.hibernate.ddl-auto=update

And log from hibernate when the application is starting:

Hibernate: create table users (id binary(255) not null, email varchar(128), user_name varchar(64) not null, primary key (id))

After automatical generating, It created id as binary with a maximum size of 255 is too much because UUID consists only of 36 characters. So we need to use UUID type instead, however, it doesn’t generate this way. It can be fixed by adding this annotation:

@Column(name = "id", columnDefinition = "uuid")

However, we’re already writing SQL definition to the column, which breaks the abstraction from SQL to Java.

And let’s fill the table with some users:

insert into users (id, user_name, email)
values ('297a848d-d406-4055-8a6f-4a4118a44001', 'Artem', null);
insert into users (id, user_name, email)
values ('921a9d42-bf14-4c3f-9893-60f79cdd0825', 'Antonio', 'antonio@gmail.com');

Adding a new column

Let’s imagine, for example, that after some time we want to add notifications to our app, and consequently track if a user wants to receive them. So we decided to add a column receive_notifications to table users and make it non-nullable.

That means that in Hibernate entity, we add the new column:

@Column(name = "receive_notifications", nullable = false)
private Boolean receiveNotifications;

After starting the app, we see the error in logs and no new column. It’s because the table is not empty, and we need to set a default value to existing rows:

Error executing DDL "alter table users add column receive_notifications boolean not null" via JDBC Statement

We can set a default value by adding SQL column definition again:

columnDefinition = "boolean default true"

And from Hibernate logs, we can see that it worked:

Hibernate: alter table users add column receive_notifications boolean default true not null

However, let’s imagine we needed receive_notifications to be something more complex, for example, true or false, depending on whether the email is filled or not. It’s impossible to implement that logic only with Hibernate, so we need migrations anyways.

To sum up, the main drawbacks of the automatically generated and updated schema approach:

It is Java-first and consequently not flexible in terms of SQL, non-predictable, oriented on Java first, and sometimes doesn’t do SQL stuff the way you expect. You can write some SQL definitions to conduct it, but it’s limited compared to pure SQL DDL.
Sometimes it’s impossible to update existing tables and do something with data, and we need SQL scripts anyway. In most cases, it ends up with automatic schema updating and keeping migrations for updating data. It’s always easier to avoid automatically generating and doing everything related to the database layer in migrations.

Also, it’s not convenient when it comes to parallel development because it doesn’t support versioning, and it’s tough to tell what’s going on with schema.

Solution

Here is how it looks without automatically generating and updating schema:

Script for initializing DB:

resources/db/migration/V1__db_initialization.sql

create table if not exists users
(
    id        uuid        not null primary key,
    user_name varchar(64) not null,
    email     varchar(128)
);

Filling database with some users:

resources/db/migration/V2__users_some_data.sql

insert into users (id, user_name, email)
values ('297a848d-d406-4055-8a6f-4a4118a44001', 'Artem', null);

insert into users (ID, USER_NAME, EMAIL)
values ('921a9d42-bf14-4c3f-9893-60f79cdd0825', 'Antonio', 'antonio@gmail.com');

Adding the new field and setting the not-trivial default value to existing rows:

resources/db/migration/V3__users_add_receive_notification.sql

alter table users
    add column if not exists receive_notifications boolean;

-- It's not a really safe with huge amount of data but good for the example
update users
set users.receive_notifications = email is not null;

alter table users
    alter column receive_notifications set not null;

And nothing stops us from using hibernate if we choose to. In configs, we need to set this property:

jpa.hibernate.ddl-auto=validate

Now Hibernate won’t generate anything. It will only check if Java representation matches DB. Moreover, we no longer need to mix some Java and SQL to conduct Hibernate automatical generating, so It can be concise and without extra responsibility:

@Entity
@Table(name = "users")
public class User {
    @Id
    @Column(name = "id")
    @GeneratedValue
    private UUID id;

    @Column(name = "user_name", length = 64, nullable = false)
    private String userName;

    @Column(name = "email", length = 128, nullable = true)
    private String email;

    @Column(name = "receive_notifications", nullable = false)
    private Boolean receiveNotifications;
}

How To Use Migrations Right

Every piece of migration must be idempotent, meaning that if migration applies several times, the database state stays the same. If we ignore that, we can end up with errors after rollbacks or not applying pieces that lead to failures. Idempotency in most cases can be easily achieved by adding checks like if not exists / if exists as we did above.
When writing some DDL, it’s better to add as much as reasonably possible in one migration, not create several ones. The main reason is readability. It’s way better if related changes, made in one pull request, are in one file.
Don’t change already existing migrations. It’s an obvious but required one. Once migration is written, merged, and deployed it must stay untouched. Some related changes must be done in a separate one.
Each developer requires a separate environment. Usually, it’s a local one. The reason is that if some migrations are applied to a shared environment, they will be followed by some failures later due to the way migration instruments work.
It’s convenient to have some integration tests that run all the migrations on a test database and check if everything is working. It can be really handy in builds that check the correctness of a PR before merging and a lot of elementary mistakes can be avoided. In this example, there are integration tests that do that check out of the box.
It’s better to use V{version+=1}__description.sql pattern for naming migrations instead of using V{datetime}__description.sql. The second one is convenient and will help to avoid version numbers conflict in parallel development. But sometimes, it’s better to have name conflict than successfully applying migrations without developers controlling the versions.

Conclusion

It was a lot of information, but I hope you will find it helpful. If you use automatically generating/updating of schema - take a close look at what is going on with schema because it can behave unexpectable. And it’s always a good idea to add as much description as possible to conduct it.

But it’s better next time to consider migrations because it will relieve Java entities, remove excess responsibility, and benefit you with a lot of control over DDL.

To sum up best practices:

Write migrations idempotent.
Test all migrations together on a test database by writing integration tests.
Include related changes into one file.
Each developer needs their own DB environment.
Take a close look at versions when writing migrations.
Don’t change already existing ones.

You can find the fully working example on GitHub.