Navigating Unit Tests with Google's Bigquery

Google’s bigquery forms a big part of Aampe’s tech stack and has supported our growth from messaging a few hundred thousand contacts to almost 100 million in a matter of months. It allowed us to scale without having to maintain significant infrastructure or incur massive costs for managed services. We use it to not only ingest and process massive datasets but also build machine learning models to power our intelligence.

With all of its upsides, Bigquery came with some big challenges. One of our largest had to do with automated testing. Growth in both our product offerings and our customers forced our platforms to evolve, and the increased complexity raised concerns about reliability and availability. We had to spend too much time on operational tasks. We needed to build testing suites to kick the tires every time we made changes, but we weren’t sure how this would work when a lot of our complexity and logic were in queries that were meant to run on Bigquery.

We solved the problem by building a test harness that would generate predictable input data, so we could verify the output data. Generating a variety of scenarios in our input data allowed us to test a lot of complicated logic. In our initial iteration, our harness included the following functionality:

Test Infrastructure Setup

We use the pytest for setting up our tests. We set up different datasets in BigQuery for each test we run so multiple engineers running tests as well as a robust CICD infrastructure wouldn’t get in each other’s way. Before each test suite is run, we instantiate the harness, create a dataset in bigquery, and initialize the data generators and validators. Each dataset serves as a namespace for the tables required to test a given scenario.

Data Generators

Most test scenarios required multiple bigquery input tables, and therefore several data generators. The generators typically take as input the number of rows of data to generate, the table name to generate the data in, the date to generate the data for, a config for whether to append or overwrite the data in the table, and possibly several more parameters, depending on the need. A test may call multiple generators to generate each of the needed input tables.

Data Validators

Once all the input data are generated, we run the modules that need to be tested. Results too are generated in bigquery tables. Validators verify these output tables. Some common validators verify things like the total number of rows in the output table, the number of distinct values produced in a given column, or that the number of rows in one table is equal to the number of rows in another. The validators are also able to support more complex needs, like verifying counts for every group after a group by operation, or ensuring the absence of duplicate keys in each group. Validators also have the option to retrieve a filtered set of rows as a pandas dataframe to do custom validations.

Teardown and Cleanup

When the tests have completed running, the test harness tears down the bigquery datasets and their underlying tables to keep our environments clean and keep bigquery costs low.

Unit test example walkthrough

Say we need to test a method that is responsible for assigning messages to contacts from a pool of messages. This method needs two input bigquery tables.

The first table contains a list of contacts, the second table contains the pool of messages.

We initialize the harness, which creates a new dataset in bigquery and also initializes the data generators and validators.
We call the contact list generator and tell it to generate N unique contacts in a table called “contacts”.
We call the message generator and tell it to generate M different messages in a table called “messages“.
We run the method being tested which returns a table name containing message assignments for each contact.
We can use the validators to:
1. Ensure there are n rows (equal to the number of contacts we created).
2. Ensure there are no duplicates in the result table.
3. Ensure no contact is assigned a null message.

In our first week, we launched this harness with 3 tests. The team has since added 200 additional tests. We’ve not only caught several bugs before production, but the tests themselves have also served as documentation for the code, allowing members of the team unfamiliar with parts of the system to approach it with added confidence.