Edge cases that involve non-determinism are inherently tricky to test - if we can't reliably reproduce a scenario, we can't be confident that our test assertions aren’t giving false positives. Race conditions are a notorious example of difficult to test behavior and are often responsible for mysterious data discrepancies.
Race condition tests provide the same general benefit as other tests: it's proof to you and your team that the guarantees a block of code claims to provide remain true in light of future code changes. Since manual verification of race conditions is time consuming and error prone, these types of tests give a big bang for the buck once written.
We'll explore how to use JS Promises to build trustworthy race condition tests, but the same idea can be implemented in any language with similar concurrency primitives.
Consider an application with two tables:
We're in the middle of building a createUser function, and we want to confirm that we're not able to register more users than should be allowed for that workplace. We love tests, so we write one up:
Note: Whenever you see the transaction() function, assume it's set to the REPEATABLE READ isolation level.
It works beautifully, and we ship it. Unfortunately, a few days later, there's a one-off failure in CI. We must have a bug somewhere! Sure enough, we dig into the database and we see a few workplaces that have more users than they should. We can fix the code, but we don’t trust our tests anymore. How do we gain the confidence back? There's some work to do.
Let's take a look at our workplace controller:
We consider for a while and eventually arrive at a theory: Sometimes, both user creation functions execute their SELECT statements before either has had a chance to INSERT their row. In this case, both end up creating a user. If the transactions executed entirely sequentially, however, one would fail as expected.
To make our test deterministic, we'll have to control the execution flow of the transaction body from our test. This necessarily means that we need to introduce something into our "real" code that’s only used by tests. We’re OK with this tradeoff; in addition to letting us write reliable race condition tests, it’ll also hint to future readers to be aware of races. We’ll write a helper to make it as unobtrusive as possible:
Our controllers will use this helper to define specific breakpoints at places we know are potentially contentious. We’ll then be able to adjust our tests to use these breakpoints so that the execution flow can be explicitly controlled. First, the updated controller:
The updated test that now ensures our tests are always failing:
Note that we're no longer immediately awaiting the createUser call. Instead, we're triggering both user creations, waiting for both of the controllers to finish looking up the necessary data, and then allowing both of the controllers to continue. This guarantees that the execution order reproduces our theory.
Our hunch was right! If we force both lookups to happen before the inserts, both transactions end up creating their user, and our test reliably fails.
Note: In a real app, you would likely want to introduce some abstraction to make this easier. For example, the breakpoints class property could be lifted into a base controller’s constructor, which would only set breakpoints in test environments.
Since we're using the REPEATABLE READ isolation level, we have to make sure to update a shared resource to get Postgres to throw a concurrent access error. By doing this, we can push our concurrent modification detection directly down to Postgres - since all other concurrent updaters will also update this shared resource, we don’t have to worry about coordinating with external locks. The easiest resource to use here is the workplace itself since we’re already querying for it. We can trigger an update at the end of the transaction:
Now that we’re correctly locking against a parent resource, our limit is respected! The slower transaction will throw with a concurrent access error. We’ve fixed the bug and eliminated test flakiness.
Doppler takes customer data seriously, and we couldn't imagine doing it without a robust test suite. Concurrency is difficult, but testing it doesn't have to be. Building a reliable way to check your assumptions goes a long way towards gaining confidence in the system now and across future changes. And on that note: the above UPDATE statement actually isn’t the ideal lock to acquire for enforcing these types of counts. Instead, the better solution is a more granular advisory lock. We’ll pick that up in a future blog post and show how to improve this code further without changing the tests at all.