Tips for Fixing Your Flaky Tests

Tired of re-running your build 3 times before you can get all of the tests to pass? Don’t start deleting all of those tests just yet. Fixing some of them might be easier than you think.

Replicate the issue

When debugging any issue I generally start by trying to replicate it. And when it comes to flaky tests, the best first step to replicate it is to run it repeatedly and see if you can get it to fail. If the test failure is “random” then running it on repeat will cause whatever random event is the cause to occur.

Rather than clicking the Run button 1,000 times, your test framework might have some tricks for running your test on repeat.

If you happen to be using TestNG, the @Test annotation can take an invocationCount to repeat a test multiple times:

@Test(invocationCount = 100)

Another option if you’re using JUnit with Spring is to use Spring’s @Repeat annotation:

@Repeat(100)
@Test

And if your language or framework doesn’t have an option for repeating a test, there’s always the option of running it in a simple loop.

If running the test on repeat didn’t replicate the issue, taking a look at the failed logs might give a hint as to the cause and how to replicate it as well. You can trace your steps back on things like a null pointer exception and figure out what would cause a variable to end up as null for instance. This might give you an idea of what could be causing the test to enter the state that’s leading to the failure.

If all else fails, and you can’t get a flaky test to fail on demand, there’s no shame in adding some logging to the test around the broken area and waiting for it to fail again in your pipeline. It won’t immediately fix the issue, but it can help give you the info you need to fix it later.

Common Causes

After fixing dozens of flaky tests in our legacy system, there’s a few common patterns that have started to emerge. The first thing I have seen is that unit tests are usually more stable than integration tests, and integration tests are more stable than end-to-end automation tests. This shouldn’t be too surprising, in unit tests you have complete control over the environment. As you move up the chain though, the environment your test is running in can start to have more variability. This is one of the many reasons why, though integration and end to end tests are important, the majority of your automated tests should be comprised of unit tests.

Unit Tests

The most common issue I see in unit tests is around not properly resetting data between tests. If you have multiple tests in the same test file, it can be easy to accidentally reuse the same collection, object, or mock between multiple tests without creating a new instance or resetting it between each test. This can be especially hard if the data is stored in the unit under test. This can end up becoming flaky based on the order in which the tests are run. While the order the test framework uses may be consistent and allows the tests to pass most of the time, it is often not guaranteed. And if by some random chance the tests run in a different order, then that can cause one or more of them to fail. The best way to handle this case is to make sure that everything gets reset between each test case, rather than before/after the whole group of tests is executed.

Another other common cause I have seen for flaky unit tests is using random data for test data. Some developers like to use random strings or numbers thinking that any of the random values that could be returned would suffice. But they often forget that a random integer can return zero, or that two random pieces of data could end up being equal. This ends up changing the condition the test was meant to cover and breaks the test. My advice is to ALWAYS prefer hardcoded values for test data over randomized/generated data. And if the data needs to have a “random” element with it to prevent collision (like a unique title or name) then at least make part of it static.

You should also always use different hardcoded values for each discrete piece of data. This avoids confusion over what the value of something should be when validating. Lastly, avoid assigning data the default value of the datatype, setting a number to 0 for example, as it can be easy to miss if the value was ever actually set or not.

The hardest issues to deal with though are ones dealing with time. Testing with time can be tricky because how long your test takes to execute is variable, and with unit tests especially, we try to control every variable to ensure consistency. A common problem I see breaks down to something similar to this:

    @Test
    public void testTimeWrong() {
        LocalDateTime start = LocalDateTime.now();

        LocalDateTime test = LocalDateTime.now();

        assertTrue(start.isBefore(test));
    }

On most systems for java, LocalDateTime.now() will return with millisecond precision. So if the start and test variables are initialized in the same millisecond then the start variable won’t be “before” the test variable, it will be equal to test. But the above might pass most of the time as it will execute with more than a millisecond between the two statements most of the time.

Another common problem when testing with time is around time zones. Different systems may have a different default zone, and if you don’t ensure your testing environment is using a consistent zone, it can lead to tests failing in other environments. While it’s a good rule of thumb to always use UTC for time, not every system was designed to do that from the beginning. Normally, you would want your tests to be running in an “as production like” setting as possible, but if your production environment is using something other than UTC, then you’ll need to ensure your test environment is consistent with that. There’s no point in setting your test environment up in a way that allows your tests to pass, but that’s also not how the code runs in production.

Time zone issues love to crop up every spring/fall when those of us developing in countries observing daylight savings time change our clocks. They also enjoy showing up when you run your test late in the day (or early in the morning) depending on your location/zone difference from UTC (or your systems standard zone). In that case, one time in your test is using UTC, while another is using the local zone. If you try to confirm that both times are part of the same “day” then you’ll end up with issues.

There’s two main ways I try to handle time in my tests to prevent these kinds of issues. The preferred method is to take control of it. Specifically, if the source code will allow for it, I will make it so I can mock the clock in my system. Then I can ensure my unit tests have complete control to set a hardcoded value for the clock when a new time variable is initialized.

    @Test
    public void testWithMockClock() {
        ZoneId zoneId = TimeZone.getDefault().toZoneId();
        Instant instant = Instant.now().plus(1, ChronoUnit.MINUTES);
        Clock clock = Mockito.mock(Clock.class);
        when(clock.instant()).thenReturn(instant);
        when(clock.getZone()).thenReturn(zoneId);

        LocalDateTime test = LocalDateTime.now(clock); // somewhere in source code

        assertEquals(LocalDateTime.ofInstant(instant, zoneId), test);
    }

If mocking/controlling the clock isn't reasonable within the source or the test, the next best defense is to ensure your tests allow for these inconsistencies in the timing. Ensure that any time objects you create in your test for comparison use the same function the source does to avoid timezone inconsistencies between your test and the unit under test. When asserting times make sure to validate in a way that will handle different test execution speeds:

    @Test
    public void testTimeBetter() {
        LocalDateTime start = LocalDateTime.now();

        LocalDateTime test = LocalDateTime.now();

        assertFalse(start.isAfter(test)); // ensure test time is equal to or after start
        assertFalse(test.isAfter(LocalDateTime.now())); // ensure test time is before the assertion
    }

Integration Tests

Integration tests can suffer from the same issues as unit tests, but they also have some new issues that can lead to flaky tests.

To clarify on integration tests, I usually refer to an application running where the test has access to the internals of the system, so it is not truly “black box”. External dependencies are also likely mocked, though I find it a common practice to have any databases or caches running that the application interacts with directly.

For these tests, a common issue is around data from one test leaking into another test. Whether that’s a configuration change, or creation/deletion/update of data for a test. Because these tests are running against a database or cache, it’s easy for data to remain there between tests. This can be tricky to debug as running the individual test will frequently pass fine, but when you run the full suite of integration tests you’ll find it fails.

A good trick for debugging this can be running the full suite of tests locally first, to get the database into a bad state, then running the individual test after and seeing if it now consistently fails. If it does, odds are another test’s data is messing up the data for your flaky test.

It can be difficult to track down which test is the source of the bad data, as it’s not the test that’s failing, but a few tricks to figure it out are:

Search your test code for anything that matches the bad test data, if you’ve hardcoded your test data, and made it unique for each test, it should be easy to find
Check any create/updated dates in the bad test data and see when it matches up with when a test was executed in the suite
Run sections of your test suite until you find which one creates the bad data, then start breaking that section down until you’ve determined the individual test causing the problem
Know that whatever test is causing the issue had to run first. So checking your test logs to determine which order the tests ran in can help narrow it down as well

The main way to fix this problem though is a good rule of thumb for your tests in general. Any data the test creates or changes should be reverted after the test, and any data the test relies on should get created or set at the beginning of each individual test. So each test is using its own data and not relying on, or affected by, the data from another test.

End To End Tests

I find end-to-end automation tests to be the hardest to keep consistent. There’s a reason you should focus on writing a large number of unit tests and fewer automated end-to-end tests. They can suffer from many of the issues mentioned above, and more. The worst and most common one I encounter though is a slight delay in timing causing an action to misfire.

For example, a page render takes slightly longer to load preventing a button click from happening. This is so frustrating as it can happen at any point in your test so that even the failure doesn’t happen in a consistent way.

What I found works best is writing your tests so that when they interact with the UI, they wait for the element to appear, rather than immediately interacting with it. This is probably best done universally around your page interactions, rather than having to handle it on a case by case basis.

Another thing that really helps with debugging end-to-end test failures is logging. Log the major interactions of your test so that it’s easy to see where the failure is occurring in the flow. But you don’t have to just log text, you can setup your test framework to take a screenshot of the UI when the test fails. Especially with automated UI tests, a picture says a thousand words. While looking at a stack trace or a log of a failed selenium test can be hard to figure out sometimes, a screenshot often makes it obvious as to what the issue was.

May your pipeline always be green!

I hope this guide helps you in the future with fixing any flaky tests you may encounter. It also hopefully gave you some ideas around writing better, more consistent tests in the future.