Our basic process and tools for shipping code at Rainforest haven't changed much in the last year or so (we use Github for hosting our code, CircleCI for CI and continuous deployment, and Heroku for our infrastructure), but we've been tweaking and improving things as needed. Most notably, we've recently automated a few more parts of our deployment process and open-sourced the results as a gem called Circlemator (pull requests welcome!).
It should come as no surprise that we're big fans of continuous delivery here at Rainforest and try to adopt it as the basis for our deployment strategy. But what does continuous delivery mean from a practical perspective? For us, continuous delivery has two equally important aspects:
What makes a roadblock good or bad? "Good roadblocks" are hurdles that prevent bugs and low-quality code from reaching production, such as code review, unit tests, and of course QA 😉. Bad roadblocks, on the other hand, are all the "button-pushing" and bureaucratic activities that traditional deployment processes are rife with: cutting releases, copying code, manually restarting servers, tweaking configurations, and the like.
A Rainforest code commit has to jump through quite a few hoops before it makes it to production:
(For more details on our branching strategy and how we do code reviews, see our blog post on how we use version control.)
It's worth noting that these steps involve a mixture of human intervention and automation: steps 1 and 2 are necessarily manual, while steps 3, 6, and 8 are easily automatable with a CI server such as CircleCI. But what about 4 and 7? Should code be merged automatically? And is step 5 even necessary at all?
The question of "how automatic" your deployment process is depends on a number of factors, such as your risk tolerance and your confidence in your testing and monitoring tools. We've kept step 4 (merging to develop) a manual process for now, since automating it has limited benefits and invites potential mishaps.
Merging to master (step 7), on the other hand, was until recently a "bad roadblock": it generally involved a developer keeping tabs on the build process and hitting the merge button once it finished. Even worse, sometimes the build would succeed without anyone merging, leading to lost opportunities to ship.
Since code review takes place before code hits develop, the final merge wasn't really adding any assurances; it seemed like a good candidate for automation. To eliminate the roadblock, we wrote a small utility (part of Circlemator) to self-merge the release pull request at the end of the build. This may seem like a small change, but it's had a surprisingly positive impact on our release cadence.
There are still some situations where releasing automatically could be a bad idea (for instance, at the end of a day when no one will be around to monitor production). We kept step 5 as a compromise: if no release pull request is open, Circlemator won't merge to master. This allows us to keep our development cadence running smoothly without shipping to production under special circumstances.
Code review is an inherently manual process, since it involves human judgment. But that doesn't mean it can't be streamlined with a healthy dose of automation.
Typical reviews often include a fair amount of bikeshedding about code style and formatting, as well as checking for common "gotchas". To minimize trivialities and keep code reviews focused on important issues, we introduced automatic style-checking through Rubocop for our Ruby projects. We use a standardized style configuration (open sourced as rf-stylez) and included a task in Circlemator to make comments on pull requests when there are style violations. (There are commercial products that do similar things, but we found the open source Pronto gem to be sufficient for our needs.)
Code reviews, while awesome, can quickly turn into a major bottleneck if they're "hierarchical"—i.e. each review has to come from a "more senior" developer than the code author. To avoid this, our code review process (like the rest of our development process) is based almost entirely on peer feedback.
Still, any codebase will have a few areas that require extra scrutiny—a peer code review may not be enough for security-related code, for instance. To help make sure changes to safety-critical code doesn't accidentally slip past scrutiny, we wrote a small utility called Commentator (naturally included in Circlemator) that we use to add checklists to pull requests when particular files change. For instance, any changes to our payment code have to be looked at by two developers, including our CTO.
Our deployment strategy works well overall, but there's still plenty of room for improvement. Here are some improvements we're planning to implement:
Our unit test suite is pretty extensive, and is getting a bit slow as it expands: even heavily parallelized through CircleCI, it takes about 20 minutes. On top of that, our Rainforest test suite takes another 20 minutes on average. That means that the total time for a commit to get to production once pushed is about an hour and a half in the best case—acceptable for most features, but a bit on the slow side for bug fixes. That increases the temptation to break process in "special circumstances", something we try to avoid whenever possible.
There are a few steps we're planning to take in the near future to speed things up:
I'd ideally like to get our feature branch builds under 10 minutes and our develop builds to 20 minutes.
So far, all of our automated "code hurdles" involve quality and bug prevention; we don't have any checks for runtime speed. Integrating an automated benchmark suite into the build would be a great way to make sure we don't inadvertently introduce performance regressions, as well as making sure we're hitting our overall production performance targets.
Right now our code is released in batches from our develop branch, so each release will typically involve a number of pull requests (and merging a new pull request to develop will cancel previous develop builds). We've considered moving to a "train" model instead, where multiple develop builds can be run in parallel and released sequentially. There are a number of technical challenges involved, however, so we're not sure whether it's worth the development effort at the moment (improving build speed seems like lower hanging fruit).
We're pretty happy with our deployment strategy —- it strikes a balance between shipping speed and quality control that we're comfortable with for now. Every development team is different, though, and we're always curious to know how other teams deploy code. In particular, we're always looking for more things to automate (assuming it's worth the effort). If you can think of anything, let us know!.