How to Deal With Flapping or Broken Tests

Tests Are Failing

”Quality is not an act, it is a habit" - Aristotle

Introduction

Tests are useful, and tests make our lives better, however, sometimes tests are flapping or just simply broken (in this context, flapping or broken tests are irrelevant to changes you have made).

And it always happens when you need to land something urgent. If tests are not working as they should, it always means that you have to invest some time to understand the root cause of it and fix it; it makes you slower, less productive, and demotivates you.

You lose your focus on the main problem, and now, the flapping test is the only enemy you need to sort out what to do next: spend some time and fix the test if it’s possible, disable it, or just drop it. None of those is a silver bullet, and a trade-off should have been taken.

What Are the Options?

There are two straightforward solutions: to fix tests and to drop tests. They both have pros and cons, and none of those is universal. Fixing the test may turn out to be time-consuming, especially if it flaps because other services or resources are unavailable or not stable. Dropping the test, on the other hand, decreases code coverage and simply ignores the problem.

Another way to mitigate this problem is to auto re-run solutions for the flapping test, which is time-consuming and not a 100% reliable solution.

Method	Pros	Cons
Drop or skip tests	Fast	Decreases test coverage Increase risks of future problems
Fix tests	Proper way :-)	Increases time-to-market
Auto retry (flaky retry)	The test will run and catch an error if it happens	Increases test time Increases time-to-market because of test run time Not 100% guarantee passing tests
Skip with deadline	Unblocks you from delivering the current task Maintains test coverage Allows you to plan test maintenance	Temporary decreases test coverage

Skip With Deadline

The main idea of this method is to skip and ignore test results until a certain date. For example, you have an urgent fix to land, and one of your tests is flapping and blocks you on the CI/CD pipeline. This is annoying, this is time-consuming, and this pisses you off.

The obvious solution is to comment or delete the test to unblock the fix on the pipeline, which might be not the best approach in this case; fixing the test is not the option either: it might take too much time or might not even be possible at the moment.

Another way is to add a skip decorator to fix this test later. But the problem is that time may never come, and the test will be forgotten and skipped forever.

The better way I think of is to skip the test with the deadline condition and comment to provide context and artifact for the follow-up, e.g., task or plan reference. This approach allows you to land your fix, and as a good citizen, plan and highlight the blocker to the following resolve.

Simple pytest Plugin

I’ve crafted a simple pytest plugin to get this functionality. The code of the plugin can be found on GitHub: https://github.com/bp72/pytest-skipuntil.

To install the plugin, run:

pip install pytest-skipuntil

Also, to imitate flaky behavior and repeat our test N-times, we will need the plugin pytestflake-finderr:

pip install pytest-flakefinder

Let’s create a simple app.py to emulate flaking behavior:

import random


class App:
    def __init__(self, a, b, fail_ratio=3):
        self.choice = [True] + [False]*fail_ratio
        self.a = random.randint(1, 100) if random.choice(self.choice) else a
        self.b = b
    
    def get_data(self):
        if random.choice(self.choice):
            raise Exception("oops")
        return [1, 2, 3]

Let’s add a simple test for the app:

import pytest

from datetime import datetime
from app import App


def test_app():
    s = App(1, 2)
    assert s.a == 1
    assert s.b == 2
    assert s.get_data() == [1, 2, 3]

Run it:

(.venv) bp:/mnt/hdd2/projects/python/dummyapp$ pytest --flake-finder --flake-runs=5 
====================== test session starts ======================
platform linux -- Python 3.10.12, pytest-7.4.3, pluggy-1.3.0
rootdir: /mnt/hdd2/projects/python/dummyapp
plugins: flakefinder-1.1.0, skipuntil-0.2.0
collected 5 items                                                                                                                                                                                                                              

test_app.py FF.F.                                                                                                                                                                                                                        [100%]

====================== FAILURES ======================
______________________ test_app[0] ___________________

    def test_app():
        s = App(1, 2)
>       assert s.a == 1
E       assert 8 == 1
E        +  where 8 = <app.App object at 0x7f8f24b10fd0>.a

test_app.py:9: AssertionError
______________________ test_app[1] ______________________

    def test_app():
        s = App(1, 2)
        assert s.a == 1
        assert s.b == 2
>       assert s.get_data() == [1, 2, 3]

test_app.py:11: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <app.App object at 0x7f8f24b11060>

    def get_data(self):
        if random.choice(self.choice):
>           raise Exception("oops")
E           Exception: oops

app.py:12: Exception
______________________ test_app[3] ______________________

    def test_app():
        s = App(1, 2)
        assert s.a == 1
        assert s.b == 2
>       assert s.get_data() == [1, 2, 3]

test_app.py:11: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <app.App object at 0x7f8f24ac7820>

    def get_data(self):
        if random.choice(self.choice):
>           raise Exception("oops")
E           Exception: oops

app.py:12: Exception
=================== short test summary info ===================
FAILED test_app.py::test_app[0] - assert 8 == 1
FAILED test_app.py::test_app[1] - Exception: oops
FAILED test_app.py::test_app[3] - Exception: oops
=================== 3 failed, 2 passed in 0.01s ===================

As you can see, it failed 3 out of 5. This example is synthetic, however, in real life, I’ve seen it numerous times: some tests are failing due to dependency service timeout, resource shortage, database inconsistency, and many other reasons.

Let’s update the test with the mark:

@pytest.mark.skip_until(deadline=datetime(2023, 12, 1), strict=True, msg="Alert is suppresed. JIRA-12346")
def test_app():
    s = App(1, 2)
    assert s.a == 1
    assert s.b == 2
    assert s.get_data() == [1, 2, 3]

As you can see, on test execution time, we got the warning that some tests are suppressed until a certain date with specific comments. I prefer comments to have reference to planning tool artifacts, e.g., task or issue.

(.venv) bp:/mnt/hdd2/projects/python/dummyapp$ pytest --flake-finder --flake-runs=5 
====================== test session starts ======================
platform linux -- Python 3.10.12, pytest-7.4.3, pluggy-1.3.0
rootdir: /mnt/hdd2/projects/python/dummyapp
plugins: flakefinder-1.1.0, skipuntil-0.2.0
collected 5 items                                                                                                                                                                                                                              

test_app.py sssss                                                                                                                                                                                                                        [100%]

test_app.py: The test is suppressed until 2023-12-01 00:00:00. The reason is: Alert is suppresed. JIRA-12346
test_app.py: The test is suppressed until 2023-12-01 00:00:00. The reason is: Alert is suppresed. JIRA-12346
test_app.py: The test is suppressed until 2023-12-01 00:00:00. The reason is: Alert is suppresed. JIRA-12346
test_app.py: The test is suppressed until 2023-12-01 00:00:00. The reason is: Alert is suppresed. JIRA-12346
test_app.py: The test is suppressed until 2023-12-01 00:00:00. The reason is: Alert is suppresed. JIRA-12346
====================== 5 skipped in 0.00s ======================

Simulate missed deadlines by setting the deadline arg to the past date. The test started to fail again with a note that the previous suppress was not working anymore.

(.venv) bp:/mnt/hdd2/projects/python/dummyapp$ pytest --flake-finder --flake-runs=5 
====================== test session starts ======================
platform linux -- Python 3.10.12, pytest-7.4.3, pluggy-1.3.0
rootdir: /mnt/hdd2/projects/python/dummyapp
plugins: flakefinder-1.1.0, skipuntil-0.2.0
collected 5 items                                                                                                                                                                                                                              

test_app.py FFF..                                                                                                                                                                                                                        [100%]

====================== FAILURES ======================
______________________ test_app[0] ______________________

    @pytest.mark.skip_until(deadline=datetime(2023, 11, 1), strict=True, msg="Alert is suppresed. JIRA-12346")
    def test_app():
        s = App(1, 2)
>       assert s.a == 1
E       assert 73 == 1
E        +  where 73 = <app.App object at 0x7f68ee4a0970>.a

test_app.py:9: AssertionError
______________________ test_app[1] ______________________

    @pytest.mark.skip_until(deadline=datetime(2023, 11, 1), strict=True, msg="Alert is suppresed. JIRA-12346")
    def test_app():
        s = App(1, 2)
        assert s.a == 1
        assert s.b == 2
>       assert s.get_data() == [1, 2, 3]

test_app.py:11: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <app.App object at 0x7f68ee4a28f0>

    def get_data(self):
        if random.choice(self.choice):
>           raise Exception("oops")
E           Exception: oops

app.py:12: Exception
______________________ test_app[2] ______________________

    @pytest.mark.skip_until(deadline=datetime(2023, 11, 1), strict=True, msg="Alert is suppresed. JIRA-12346")
    def test_app():
        s = App(1, 2)
>       assert s.a == 1
E       assert 17 == 1
E        +  where 17 = <app.App object at 0x7f68ee4b5540>.a

test_app.py:9: AssertionError
test_app.py::test_app[0]: the deadline for the test has passed
test_app.py::test_app[1]: the deadline for the test has passed
test_app.py::test_app[2]: the deadline for the test has passed
test_app.py::test_app[3]: the deadline for the test has passed
test_app.py::test_app[4]: the deadline for the test has passed
====================== short test summary info ======================
FAILED test_app.py::test_app[0] - assert 73 == 1
FAILED test_app.py::test_app[1] - Exception: oops
FAILED test_app.py::test_app[2] - assert 17 == 1
====================== 3 failed, 2 passed in 0.01s ======================

The feature of the plugin is that it outputs information about skipped tests and those with passed deadlines. This also can help to keep track of the skipped tests, so they are not forgotten and fixed on time.

Conclusion

This approach helped me not to go mad when it’s a hotfix to land and random tests fail and block your CI/CD. It’s also contributed to daily time management very positively since everything is scheduled, and nothing will be lost in time and rush!