Getting to Continuous Delivery

Picture of Paul Burt
Paul Burt, Wednesday July 8, 2015

If you can do continuous delivery (CD), you should do continuous delivery. Anyone want to argue with that? This is the Internet, so yes.

Smart choice, Internet. Cargo cult programming is dangerous. Take a moment and read a bit about CD if CD isn't already familiar to you. Sounds neat, right? Great, read on.

When you do CD, your engineers need to maintain a fluid and fast testing process. If you already do this effectively, then this article is over for you. Your company is ahead of 99% of the folks out there.

More likely, your company is just like the rest of us. Maybe it's already been suggested that engineers write tests. Maybe the engineers have already told you to get bent. Or, statistically more likely, they've rationalized that the unit test coverage is already pretty good and it's probably safe to deploy multiple times a day.

All of the above works fine. That is, until one day your sort-of-but-not-really-CD process smashes into a headline case.

What’s a headline case? It's a human-resources-friendly way to put a name on your “oh shit" moment, as in "Oh shit, no one can access the login page on production?!" The word “headline” refers to newspaper headlines — that's where you'll find your foibles documented if a poor release manages to crap things up badly enough.

Most folks are fine running some version of almost-but-not-really-CD until the “oh shit" moment occurs. When it does, you might start fielding CxO questions like “How the hell did this make it through QA?"

Then, unsurprisingly, you're no longer CD. Or someone takes the blame, gets fired, and the not-quite-but-sort-of-kind-of-CD process ambles on. Either way, the first headline case triggers process changes.

First, the release process slows down. In the worst case, the company visits an old friend: the two-week release. Why? A headline case means there's no way the app has sufficient test coverage. Who knows what's really going to happen on a deploy? Every release results in an inordinate amount of sweating and swearing.

This is where most would-be-CD solutions run afoul. Caution leads to a decision like, "We should slow down." The sagacious Martin Fowler weighs in: "[In CD,] your team prioritizes keeping the software deployable over working on new features."

There's a bit of subtext to this quote. To release, tests need to be reliable. To release continuously, tests need to be fast. In other words, CD needs CT (continuous testing).

What is testing anyway?

CT is testing that can keep up with CD. It's fast, it's maintainable, and it works.

A lot of testing philosophies can lead to CT: TDD, BDD, ATDD, integration, functional, unit, and the list goes on. Let's simplify with a controversial claim: none of these testing philosophies matter. What matters is the result. What matters is speed and reliability.

Regardless of your strategy's label, it'll need to cover the two most important categories of testing: exploratory and regression.

Exploratory testing is unguided exploration of a part of an app. This primarily helps check new features. Occasionally, it's useful to explore older features as well.

This type of testing is impossible to automate since it requires some combinatorial play. In other words, you need a human being to poke around.

The purpose of exploratory testing is to explore the unknown. Unknowns appear when you build new features. Unknowns might appear in an update to an old feature (when, say, a shared piece of code changes). Exploratory testing covers both well. It's all about discovery.

Unfortunately, it's also manual — painfully slow and manual. In my experience, the length of most exploratory testing is measured in days. This doesn't bode well for CT, since we need speed to get to CD.

Let's look at the usual quick fixes. Services like Testlio reduce some of the pain by providing a crowd of testers. At Rainforest, we like crowds of things. A crowd definitely moves faster than a single in-house resource.

That said, the process is still fundamentally manual. Even with a crowd, tests will take more than a few minutes to run. Usually, it will take on the order of half a day to two days. For most shops, exploratory testing is a crucial step — but it should not be a blocking step for a continuous process.

Regression testing should be more familiar. Do you write unit tests? That's a regression test. Do you add functional or integration tests when you find a bug? That's a regression test. Regression testing ensures everything still works the way it's supposed to.

image of Oh Sheet spreadsheet

One way to think of a regression test suite is as a checklist. Naturally, some suites live their lives inside an Excel spreadsheet. A spreadsheet probably also means tests are executed manually. If you're angling for CT, this is the part that you really want automated.

CT enables CD. To get to CD we need fast tests. For speed, you must automate. Easy enough. Most folks believe in automation, right?

This is where most journeys to CD run off the rails. There are a thousand ways to screw it up. The rest of this article is dedicated to what I see as the two most common.

The first is expecting testing to raise quality. The second is understanding why the "throw it over the wall" strategy fails.

Testing raises quality

Inigo Montoya from the Princess Bride

You keep using that word. I do not think it means what you think it means.

You'd think the more you test, the better off you are. You'd think that, and you'd be wrong. Testing won't fix your quality problems. Testing reveals your quality problems.

Start testing earnestly and you'll find all sorts of ugliness. It’s a bit counterintuitive but it’s important. A friend, Jeff Ammons, noted that this is similar to the introduction of metal helmets in World War I.

At the outset of WWI, soldiers wore cloth or leather helmets. These were poor protection from shrapnel and other pointy objects so the British (and other combatants) switched to steel helmets. Paradoxically, after the switch, the number of head injuries went up. Why?

Because fewer soldiers died. What once resulted in death instead became a head injury. Head injuries might eventually lead to a death, so the next step is to look at treatment.

Similarly, once you have a decent QA process, expect the number of bugs (paradoxically) to skyrocket. Build a process for organizing and triaging the deluge.

Here are some quick tips to help with bug management:

  • It's okay to mark a bug as "will not fix". Just check with PMs to see if they agree. Archive rather than delete, so the bugs are still searchable.
  • Track irreproducible bugs. They're important, but they can also clutter bug queues. I like to archive these guys with the heisenbug tag. If a heisenbug appears frequently enough, it's worth spiking on.
  • Declare bug bankruptcy! If a bug reaches a certain age, it may no longer be relevant. Check the bug again if it sounds serious. If it doesn’t, archive it. A usable bug queue is more important than a comprehensive queue. Talk to your team about what kind of age limit makes sense for bugs.

Throw it over the wall

The “throw it over the wall” strategy is a common way to do QA. It's also astoundingly ineffective.

First, what is the “throw it over the wall” strategy? It's an engineering cycle that looks something like the following:

  1. Developer builds the thing.
  2. QA tests it (either manually or through writing scripts).
  3. Developer releases.

The term "throw it over the wall" comes from steps one and two. Between these steps context is often lost between dev and QA. To complicate things, "build the thing" and "test it" are really a loop. A slew of bugs are almost always found on the first pass. When fixes appear in the next build, the testing cycle begins anew.

The pain multiplies when one considers the effort involved in coordinating releases, resolving merge conflicts, and organizing across teams.

The problem isn't testing. The problem is the handoff of testing responsibilities. In order for testers to write tests or do the manual work, they need context. Transferring context is hard.

Another negative is how this process impacts motivation. The “throw it over the wall” approach encourages the developers to say “That's not my job.” When managers ask why a feature hasn’t shipped yet, the easy response is "It's testers holding things up." When you ask testers about it, they'll point the finger right back at dev: "They gave us buggy software."

This is a problem that some folks call the “definition of done”. It can a appear in any process, but it's grossly exacerbated in the “throw it over the wall” approach.

How do we fix it? Fire all the testers? Yes and no. Testing is important. To work optimally, developers need to own testing.

But if the developers are testing, what are the testers doing?

They should be building tools that empower devs to write faster, better tests. Devs still own testing. The great tooling and infrastructure just makes them a lot more effective.

Everyone wins. From what I've read, this is the approach that a lot of large shops follow. Good infrastructure + dev ownership = high quality.

Conclusion

A lot of folks use the term CD loosely. It's easy to see why. Releasing multiple times a day is seductive. Yet, if the test plan can't keep up, it's dangerous.

Set your priorities accordingly. Fast + reliable tests lead to CD.

Test quickly and reliably. Deploy continuously.