A Beginner’s Guide to Automated Testing

This is the second entry in my series on navigating large codebases for new software engineers. It’s not necessary to read the first article to understand this one, but you can find it here.

Software tests are the best gift an engineer can give themselves. Tests make your life and everybody else’s easier.

When I started as a software engineer I had absolutely no idea why tests were important and I felt pretty lost on where to start. From my experience mentoring and teaching new engineers, I know I’m not alone. So this is where you start. Here, with real-world examples, I will make the case for automated software testing and I will explain how to get started writing tests in a disciplined and thoughtful way.

Part I: The case for testing

Making your life easier

Years ago, when I was starting on my first project in my first job out of school, I didn’t write any tests. I knew that theoretically tests were good (and I’d written them during my internships) but I didn’t really have a solid understanding of why. I didn’t think it mattered much if I didn’t write tests at all or at least didn’t write them until the last minute. I was on a deadline and I didn’t want to get bogged down writing all that unnecessary extra code.

I was creating a mobile A/B testing system for Dropbox’s Android app. I submitted the Java component of the system for code review, with a note: “TODO: write tests”. As I developed the code, I engaged in an incredibly tedious manual testing process that involved setting up a test server on my computer, installing the app on a test phone, and manually creating an A/B test on the test phone. Obviously I didn’t test very many code paths because it was just so tedious.

After much back and forth, the code reviewer said “You can’t merge this without tests.” So I wrote some basic tests and merged it.

Lo and behold, I soon needed to fix a small bug. I then had to test that everything worked after the fix. “Hmm…,” I thought. “I’ve written some tests for this. I suppose I could use those.” I ran the tests. Within a few seconds, I knew that everything still worked! Not just a single code path (as in a manual test), but all code paths for which I’d written tests! It was magical. It was so much faster than my manual testing. And I knew I didn’t forget to test any edge cases, since they were all still covered in the automated tests.

This gets to one of the most important reasons for tests: tests can make your development process faster. If I’d written those tests earlier then I could have frequently just run the tests in under a minute instead of tediously poking and printlning my code over and over again to verify it was doing what I expected.

Preventing new bugs over time

For large codebases preventing bugs over time is really the most important part of tests. In codebases with many engineers contributing, you do not necessarily have control over changes to your code and especially changes to code that your code depends on. With dozens of commits a day, just by chance one of those commits will likely inadvertently change a behavior that your code relies on.

If you haven’t written tests, then there’s no reliable way for other coders to know that their commit has impacted yours. Good tests are an explicit signal to other engineers (or a future version of yourself) of your assumptions about the behavior of your code and its dependencies. In an ever-evolving codebase of millions of lines, how else could others possibly know that your important (but only semi-related) code will no longer work because of their change? If your code is still in the codebase a year (or five) after you’ve committed it and there are no tests for it, bugs will creep in and nobody will notice for a long time.

I witnessed an obscure but important user-facing feature broken for weeks because of a change in a seemingly unrelated part of the code. It took a build-up of user reports to trickle through customer support to engineering before anybody noticed. Afterwards, the team wrote a post-mortem about how the unclear dependency structure allowed this seemingly unrelated code to break their feature and go unnoticed. The postmortem didn’t even mention the simple obvious thing that would have avoided the whole breakage in the first place: a test! (Of course better code structure is also very important.)

Tests are your best weapon against the complexity of large codebases. Although you try to keep your code clean and clear, something will always break. Remember the rule: If it matters that the code works you should write a test for it. There is no other way you can guarantee it will work.

Part II: Tips for effective testing

Here are some basic tips to make your tests maximally useful and minimally painful. I will introduce some testing concepts and terminology. This is by no means complete. There are entire books worth reading on the topic, but it’s a good start.

Many small tests vs one big test

A lot of times I come across monolithic tests like “fn integer_tests() {}” that test fifteen behaviors of Integer in a row (like this example). I recommend having fifteen separate tests here for two reasons.

The first is purely practical: if that test fails halfway through, you won’t know whether the second half of your fifteen checks are passing or not because they won’t be run. This can make it harder to debug. If you know which three of fifteen exactly are failing, you might be able to identify the problem immediately.

The second reason is a subtler readability issue. With a glob of fifteen checks in a row, it’s harder for another engineer reading them to grok exactly what’s being tested. It’s also harder for them to see whether they need to add a new test case after a change or whether their case is already covered. If there are fifteen separate tests, it’s pretty easy to read through the neat names:

def test_integer_addition():...

def test_integer_subtraction():...

It’s also very easy for readers to look at how each function is set up and add a new test.

Make it easy to add new tests

It should be obvious to somebody editing your code how to test their changes. I recommend making helper functions in your test file to make set up and tear down simple. Most testing frameworks provide “before each” and “after each” helpers. If adding a new test case is extremely easy then you can be more confident that your library will have high test coverage over time. But if the author has to spend a lot of time investigating exactly how to create the appropriate inputs, etc, then if they’re in a time crunch they might decide to skip a test all together. You should always make it easy for people to do the right thing.

Unit tests vs integration tests

Unit tests exercise one piece of code, like a class, module, or function. They shouldn’t need to set up an entire environment or complex dependencies (like databases). They’re usually very thorough and very quick to run.

Integration tests, sometimes called UI tests or system tests, will test the end-to-end functionality of your project. They’re usually slower to run because they have to initialize an environment, and they’re prone to be flakier because often small changes (like fixing a UI bug) can cause them to fail.

Both kinds of tests are important, but day-to-day you should focus on unit tests because they’re more modular and maintainable and they’re cheaper to run. This usually involves faking things that your code depends on so they don’t have to be created. That way you can be pretty sure that if a test fails it’s from your code breaking, not the thing it depends on. These fake things are often called mocks.

Integration tests are important as a last line of defense because they usually test what the user actually sees, and in the end that’s what matters. All your unit tests could be working, but there could be a problem where your libraries aren’t interacting together as expected. You won’t catch this without an integration test.

Which to write first: code or tests?

There’s a formal development methodology called Test-Driven Development that dictates writing tests before any feature code. This can be hard to do, because you really don’t know exactly what your interface should look like until you start writing it and realize some of your assumptions need to be changed. On the other hand if you’re not focused on testing during development it’s easy to end up with code that’s hard to use and hard to test.

The sweet spot is to write code and test in parallel. It’s an iterative process. Bouncing back and forth between your perspective and an outsider’s in a test helps guarantee your code’s interface is usable. Think of it like a painter who stands back from the canvas and pretends to be an ordinary viewer. You’ll find that writing tests as you go makes your interfaces better and makes your code more testable. If you find yourself writing something hard to test, you’ll notice it early on when there’s still time to improve the design.

Avoid Flakey Tests

Flakey tests are tests that fail some percent of the time, either because of a rare bug or because there’s a problem with the way the test is written. Say the test relies on the order things are inserted in a hashmap. Hashmaps are unordered, but maybe the iterator will return them in order 98% of the time. The other 2% of the time your test will fail. This can lead other people to think their change broke the build and waste time hunting bugs that don’t exist. If your test is flakey do whatever you can to remove the flakiness. To achieve this stability tests should be deterministic. Don’t depend on a given ordering of threads, and if you need a random number generator make sure to specify a fixed seed in your tests.

Test the interface, not the implementation

When you’re testing, it might seem easiest to dig into your interface and test that private variables have the right value at the right time. While this seems convenient, it has some problems:

Your tests will need to be rewritten if the internal implementation is changed. That is, a tree should have the same behavior whether it’s implemented as a flat array or with pointers. If your tests are accessing the array, then if the implementation is changed to pointers you’ll need to totally rewrite your tests.
Mucking with internals of the interface takes focus away from what really matters: do the functions that the user of this interface will call work as expected?

I’m not advocating that someone unfamiliar with the code writes the tests. You know the internals and you know what types of edge cases you should test for. You should haven’t to reach into private data to test that. Reaching in can cause you to butcher your interface and add functionality whose only purpose is to verify state in your test. It’s better to focus your energy on the correctness of real use-cases.

Try it yourself

As with all my posts in this series, my goal is to provide new engineers a practical framework for being successful in large codebases. You’ll still need to build your own intuition and learn from your own mistakes but hopefully hearing about mine will help the process go faster.

As you grow as engineer, you realize more and more that testing is a great tool, not an annoying overhead. It’s a way of helping you write your code faster, make fewer mistakes, and avoid bug creep. Next time you write code, set aside time to treat yourself with some tests.

Article 1: Trust No One: An Introduction to Large Codebases for New Engineers

Article 3: Quick Tips for Gitting on a Team