Engineering

Zero downtime database migrations

October 27, 2020

Russell Smith

Contents

All developers will at some time or other have to deploy code that changes the structure of a database. You might have downtime during these deploys due to database changes being required by the new code that you just deployed. There are easy ways to work around those downtimes and provide you a real zero downtime deployment process.

How does that work? I found out that to explain the basics of how to handle database migrations properly, it works better with some examples.

Before jumping to the examples, let me tell you that we, at Rainforest, are using Rails and its built-in database migration system, so this might be slightly different for you but the idea is the same. We also use Postgres which doesn’t create the obstacles that some other database systems have, for example with locking tables on schema changes. If you use MySQL, you should definitely take a look at the Large Hadron Migrator which is used by a bunch of people including GitHub and Shopify.

Before we get started, if you remember one thing from this blog post it should be plan your database migrations carefully. It takes more time to do migrations with no downtime but it is worth it.

Contents

Case #1: I want to add a new column

This is an easy one.

Before, you would have added a new column, let’s call it ‘status’ and deployed the code at the same time. For a brief moment, your could have tried to insert data in a column that is not existing (yet) and experience errors which could make your application unusable. Let’s see how we can do this safely without impacting our clients.

Add a migration to create that new column.
Deploy that new code with this migration, after getting through your usual process (feature branch, pull request, code review, QA, etc.).
Run the migration to create the new column.
Get back to your code editor and add your new feature using that new column.
Ship it.

As you can see, we did not add code to use it in the first place but just added the new column, and then in a second phase we actually started to use it. There was no downtime and no client was harmed in the making of this example.

Case #2: I want to change a column type

This is already way more complex.

Let’s say you have a column for which you want to change the type. There could be a couple of reasons for that, but let’s say you have a “text” column you are using to store serialized JSON. Hey, this could be a native Postgres JSON column, right? Let’s do the change.

We’ll have to go through a few phases here.

Phase 1

Add a migration to create the new column called “my_column_JSON”
Ship it!

Phase 2

Add a migration to migrate all the data from “my_column” to the new column. You should probably do it in chunks. What we do at Rainforest is usually kick off a background job that will do the migration in batches of 100 or 1000. Once that background job is done, it’s kicking off another one, and so on. The idea is to prevent any transactions taking too long. Again, the idea is to have little to no impact on clients and overall performance.
As part of this step, you also need to add a trigger of some sort updating the JSON version when a row is added or updated. In Rails, this could be as simple as adding this to your model: ruby class MyModel serialize :my_column before_save do my_column_JSON = my_column end end
Ship it!

Phase 3

Change the code to be aware of the new column and use it but also smart enough to use the original column name once the switch from one to the other is done.

class MyModel
  def my_column
    if self.has_attribute?(:my_column_JSON)
      self.my_column_JSON
    else
      self.attributes[:my_column]
    end
  end
  
  def my_column=(value)
    if self.has_attribute?(:my_column_JSON)
      self.my_column_JSON = value
    else
      self.attributes = { my_column: value }
    end
  end
end

Ship it!

Phase 4

Rename the columns: the old one to “my_column_deprecated_27_06_2014” and the new one to “my_column”.
Ship it!

Phase 5

Remove all the hacks from the phase 3.
Ship it!

We’re done, you can move on to your next feature!

Case #3: I want to rename a column

You probably got the point now, right? So let’s make this one more straightforward.

Create the column with the new name you want
Add some sort of trigger to fill the new column as the new rows are created (as shown in the previous example)
Migrate all the current data
Change the code to use that new column
Rename the old column to a “my_old_column_deprecated”
Profit!

Bonus: I want to delete a column

“Hey JP, there is no downtime usually when I do this!”. Correct. I just wanted to tell you a little trick we use at Rainforest. Don’t you like free tips?

If you just remove the column, well, this is not reversible and you can’t rollback as the data will have been deleted.

It’s simple: don’t delete it just yet – just rename the column to “ZZZ_my_column_deprecated_20201007” and boom, you can rollback! ZZZ also means it won’t show first, and is easy to find.

Some more tips

Always test your migrations against either a copy of your production data or something that is roughly equivalent. You can see how long it takes, the kind of performance impact the migration will have and find solutions to these impacts if required.
When adding an index, with Postgres, use “CONCURRENTLY” which will do the job without taking any locks.
Again, if you use MySQL, you will most probably want to use Large Hadron Migrator or something similar.
Beware of column caches in ORMs, they could be a source of problems and/or confusion.

I hope that will help you to get your deployment process as frictionless as possible.

Thanks for reading.

Find more bugs, faster, without adding headcount

Rainforest gives your team QA superpowers.

Spend more time building product, not bashing bugs.

Talk to us

Talk to Sales