Solving Slow Tests

2020, January 16

Everything has a cost. Some costs are obvious. Money costs, time costs, overhead costs. Most, but not all, apply to writing tests for your software.

There is no doubt that writing tests often proves useful; that is, the benefits outweigh the costs. However, there is also a point at which writing tests no longer proves beneficial. Your quickest unit test has a negligible time-cost, but if it’s 100 lines long, the cost of changing it is likely large. Whether you like it or not, writing a unit test makes changing the unit’s interface harder too. However, the benefits usually outweigh the costs. Most tests are fine. Some are not.

One of the most obvious costs of tests is the time it takes to execute them. Unfortunately, many test suites take a long time to run, and once the length of time slowly crosses an arbitrary and silent boundary, those test suites become too costly. The boundary exists at different places for each piece of software, each team, and each individual. Some people cannot tolerate test suites for a simple webapp taking more than five minutes. The same person would probably be fine if the test suite was for a medical device, and the consequences of bugs were potentially fatal. Some people can tolerate test suites taking an hour. You, and your colleagues, need to identify where the boundary is.

There are lots of reasons for slow tests, but how should we remedy them? Firstly, we should triage, and then apply some common patterns.

Triage

Your options, from easy to hard:

  1. Delete your tests
  2. Delete the tests that take the longest to run
  3. Assess the benefits of each slow test, and decide if they’re worth the cost. If not, delete it.
  4. Run slow tests in parallel
  5. Work out which tests are slow, why they are slow, and see if there is a way to speed it up

Don’t dismiss options 2 and 3. If the costs outweigh the benefits, delete them!

Deleting tests is just like deleting regular code. If they’re testing something that cannot happen, they’re useless. Can they be refactored to eliminate any duplication? Are they actually testing something that happens, or are they testing an interface in a way that is impossible? Is the behaviour in question already tested somewhere else, by something quicker? If you decide that a test must be kept, can you reduce its scope? Does it really need to touch the filesystem, database, or the Internet? Can we stub or mock those parts instead?

Mutation testing can help determine which tests should be deleted. It works by randomly changing pieces of logic and identifying the value of each test. Some tests never fail if the underlying code changes. Sometimes, multiple tests fail. Each mutation is an indicator to the value of the test.

A special mention to end-to-end testing

End-to-end tests are a special kind of nightmare that most organisations believe there is no alternative to. They are incredibly costly in every regard, not just the time it takes to run them. In some cases, they are the most expensive thing to write, and maintain, on an entire project. There is a great deal of thinking behind scrapping end-to-end tests 1 2 and verifying systems in production.

Parallel

For tests that cannot be deleted, can they be run in parallel? Use Amdahl’s law; if your slowest test takes 5 minutes, your entire test run would take 5 minutes if every test ran in parallel.

Tests often interact with each other, and cause each other to fail by using shared resources. Teams often convince themselves that this is fine, and move on. However, with a little design, its possible to solve this completely. I recently worked on a system with around 5000 tests interacting with an Oracle database. The test run took less than 60 seconds, as it was parallelised, and the tests did not share any database state.

The same thing can be applied to message queues. Read the queue-name from config. Change the config when you start the test to use my-queue-435xz53 instead of my-queue so that each test has a unique queue to work with. Given enough randomness in the names, the tests won’t interact.

Random + Parallel = Fast.

In Memory

Instead of using your normal message broker, use an in-memory version of it. It’ll be faster. Wrapping a Queue datastructure inside of a class, and injecting that normally serves most use-cases.

The same applies for a database. It is preferable to avoid database access in your tests. However, if your design requires it, then use an in-memory database if you can.

If you can’t avoid database IO in your tests, see the point above; make sure it’s parallelizable. If that is a struggle, and you’re using a SQL database, is it possible to perform each test inside a database transaction, and then not commit it? That effectively isolates each test from the others with potentially less refactoring of application code.

If all else fails, remember to tune your threadpools.

Async

The final section is about dealing with asynchronous tests, as they can be a major source of problems in a test-suite.

Test that cover async boundaries can be broad in scope, and they are generally less reliable and often introduce unnecessary delays. This is partly due to their nature, but good design can rectify this. Async code is not less performant. Sometimes it is quite the opposite. However, asynchronous code is harder to reason about, and therefore harder to test.

A common, but bad, solution to testing async code is to add wait-conditions into your test-suite. Do something, wait ten seconds, and check the result. As a technique, it is very simple, easily understood, and often works. However, choosing the time to wait is tricky. Choose a value that is too small, and the test will fail. Choose a value that is too large, and it adds unnecessary delay. It is an optimisation problem, and a tricky one too. The ideal value varies depending on the strength of the hardware chosen to run the test.

A step up in complexity and effectiveness is an eventually construct that uses polling. This specifies a maximum timeout in which a test condition must be met, but also the frequency of checking the condition. This is still an optimisation problem, but the unnecessary delays are at worst only equal to the polling interval. A similar solution, in languages such as JavaScript, is to use async functions or callbacks. As soon as the code completes, the test will pass.

There are also techniques that negate the complexity of async tests. It is possible to construct your logic as interpreters, compose them together as synchronous calls, and then test that. If that didn’t make any sense, read about using Testing with Tagless Final (in Scala). I’ve also touched on this topic before, by using the Id monad in my post on Higher Kinded Play Framework.

A Final Note

Remember that everything is a trade-off. Writing tests can be beneficial but you will also incur costs by doing so. Good design in your test-suite can lessen these costs substantially.