It doesn’t matter if you’ve got a team of human QA testers or you’re using one of the many automated testing solutions out there, the truth is this: if you have poor data seeding your QA strategy is going to suffer tremendously.
If you do it properly, it enables you to run your whole test suite concurrently, keep your tests focused on the actual functionality they should be testing, and make maintenance easier.
Neglect it, and you end up with flaky tests, long test run times, and a general annoyance with your whole testing process.
Over the past few years helping run RainforestQA and interacting with some of our customers I’ve noticed a few recurring patterns that people fall into when it comes to seeding data for testing:
- “yolo” – This is what most people seem to attempt first when beginning on their testing journey. If they can just create the account and resources as part of the test itself everything will be perfect. Never mind that getting into the state needed takes a bunch of extra steps that might fail and cause flakiness in our test suite. Sometimes people venture further down this path and add magic API endpoints to their app to create accounts or resources in a specific state.
- “pets” – Like our devops friends say, cattle are better than pets. We don’t want data or entities that someone manually curated into the perfect test account. This approach doesn’t scale when you want to bump up your test concurrency or start covering some new features, since now you need to manually create some more test accounts and hope they’re all configured properly. When things fail, you’re left wondering if your pet account is in the wrong state or someone has changed it.
With both of these approaches, figuring out how to properly clean up the created resources is complicated. Sometimes you can just wipe your database back to pristine condition but we often see people trying to piecemeal delete individual resources that were created, or undo the side effects of a test so they can run it again. If you can’t wipe and reset your whole QA database to a known state it’s impossible to have any real trust in your test suite.
So what is the right thing to do? There are probably a few valid approaches you could take but some key things you should be doing are:
- Manage your seeds with code – We have different scripts to create each type of account/resource for the various scenarios we need to test. These scripts run against a clean database and create all the necessary database entities. If we want to enable more concurrency, it’s easy to change some values to say generate 50 accounts of a particular type instead of 10. For us this process takes about 30 minutes so we dump the full database state to a
.sql.gzfile at the end of it, ready to be restored quickly. - Reset your QA environment to a pristine state before running any tests – Running
pg_restore, using your cloud provider’s database snapshots, or directly running your seed creation scripts against a fresh database are some ways to do this. The most important thing is that this happens immediately before executing any of your tests, in order to give confidence that any issues are not a result of an improper state.
And if you want to be even better:
- Make your database reset easy – We have an internal service that listens for a webhook sent from our testing platform telling us that our test suite is about to run. It quickly resets the database before signalling that the run should proceed.
- For some external dependencies, we also run a few scripts after restoring the database to reset these 3rd-party resources to a clean state. This covers things like clearing search indexes, removing 3rd-party OAuth authorizations, or fixing up Stripe subscription dates.
- Make adding seeds easy – Sometimes writing the code to create an account in the right state is difficult. Maybe it has assets (e.g. S3 resources) that need to be uploaded and associated with it. One way you can make this a little easier is to create some helper scripts that can use your public API to pull the relationships and assets out of an existing account and turn it into code that you can then run from a clean slate.
This approach has worked well for us, enabling us to run our suite of ~200+ end-to-end tests in under ~15 minutes. At this speed, it’s a no-brainer to run the whole test suite as a gate in our CI/CD pipeline, and the confidence it gives when releasing new functionality is priceless.
