Zero downtime database migrations

Last updated in October 2020.

All developers will at some time or other have to deploy code that changes the structure of a database. You might have downtime during these deploys due to database changes being required by the new code that you just deployed. There are easy ways to work around those downtimes and provide you a real zero downtime deployment process.

How does that work? I found out that to explain the basics of how to handle database migrations properly, it works better with some examples.

Before jumping to the examples, let me tell you that we, at Rainforest, are using Rails and its built-in database migration system, so this might be slightly different for you but the idea is the same. We also use Postgres which doesn't create the obstacles that some other database systems have, for example with locking tables on schema changes. If you use MySQL, you should definitely take a look at the Large Hadron Migrator which is used by a bunch of people including GitHub and Shopify.

Before we get started, if you remember one thing from this blog post it should be plan your database migrations carefully. It takes more time to do migrations with no downtime but it is worth it.

Case #1: I want to add a new column

This is an easy one.

Before, you would have added a new column, let's call it 'status' and deployed the code at the same time. For a brief moment, your could have tried to insert data in a column that is not existing (yet) and experience errors which could make your application unusable. Let's see how we can do this safely without impacting our clients.

  1. Add a migration to create that new column.
  2. Deploy that new code with this migration, after getting through your usual process (feature branch, pull request, code review, QA, etc.).
  3. Run the migration to create the new column.
  4. Get back to your code editor and add your new feature using that new column.
  5. Ship it.

As you can see, we did not add code to use it in the first place but just added the new column, and then in a second phase we actually started to use it. There was no downtime and no client was harmed in the making of this example.

Case #2: I want to change a column type

This is already way more complex.

Let's say you have a column for which you want to change the type. There could be a couple of reasons for that, but let's say you have a "text" column you are using to store serialized JSON. Hey, this could be a native Postgres JSON column, right? Let's do the change.

We'll have to go through a few phases here

Phase 1

  1. Add a migration to create the new column called "my_column_JSON"
  2. Ship it!

Phase 2

  1. Add a migration to migrate all the data from "my_column" to the new column. You should probably do it in chunks. What we do at Rainforest is usually kick off a background job that will do the migration in batches of 100 or 1000. Once that background job is done, it's kicking off another one, and so on. The idea is to prevent any transactions taking too long. Again, the idea is to have little to no impact on clients and overall performance.
  2. As part of this step, you also need to add a trigger of some sort updating the JSON version when a row is added or updated. In Rails, this could be as simple as adding this to your model: ruby class MyModel serialize :my_column before_save do my_column_JSON = my_column end end
  3. Ship it!

Phase 3

  • Change the code to be aware of the new column and use it but also smart enough to use the original column name once the switch from one to the other is done.
  • Ship it!

Phase 4

  1. Rename the columns: the old one to "my_column_deprecated_27_06_2014" and the new one to "my_column".
  2. Ship it!

Phase 5

  1. Remove all the hacks from the phase 3.
  2. Ship it!

We're done, you can move on to your next feature!

Case #3: I want to rename a column

You probably got the point now, right? So let's make this one more straightforward.

  1. Create the column with the new name you want
  2. Add some sort of trigger to fill the new column as the new rows are created (as shown in the previous example)
  3. Migrate all the current data
  4. Change the code to use that new column
  5. Rename the old column to a "my_old_column_deprecated"
  6. Profit!

Bonus: I want to delete a column

"Hey JP, there is no downtime usually when I do this!". Correct. I just wanted to tell you a little trick we use at Rainforest. Don't you like free tips?

If you just remove the column, well, this is not reversible and you can't rollback as the data will have been deleted.

It's simple: don't delete it just yet - just rename the column to "ZZZ_my_column_deprecated_20201007" and boom, you can rollback! ZZZ also means it won't show first, and is easy to find.

Some more tips

  • Always test your migrations against either a copy of your production data or something that is roughly equivalent. You can see how long it takes, the kind of performance impact the migration will have and find solutions to these impacts if required.
  • When adding an index, with Postgres, use "CONCURRENTLY" which will do the job without taking any locks.
  • Again, if you use MySQL, you will most probably want to use Large Hadron Migrator or something similar.
  • Beware of column caches in ORMs, they could be a source of problems and/or confusion.
  • I hope that will help you to get your deployment process as frictionless as possible.

    Thanks for reading.

    Related articles

    How to remove Recompose and replace with Hooks

    In our last post, we explored the pros and cons of Recompose and why we decided to remove it from our codebase. This post includes the strategy we used to approach the large task of implementing that refactor.

    Test your JavaScript with Jasmine part 2

    I am going to show you a few more things that will make you more efficient at using Jasmine to test your JavaScript.

    Using a CircleCI build server as an ad-hoc QA environment

    If you like many others don’t have an optimal environment setup, you are likely lacking an isolated QA environment for your integration tests.

    Optimal Environment Setup

    In this post we're going to look at optimal environments for webapps. This is part two in a series - the first post looks at what are environments for?