Unveiling Causal Impact: From Theory to Practice

Hey everyone!

Welcome to the second part of our exploration into Causal Impact analysis. In the first part, we provided a quick dive into the theoretical aspects of Causal Impact analysis. Now, let's roll up our sleeves and delve into the practical application of Causal Impact analysis.

We will guide you through a specific dataset, demonstrating how to implement the library and interpret results.

This hands-on approach empowers you not only to use the Causal Impact library effectively but also to draw meaningful conclusions from your analyses.

How to Use Causal Impact

Imagine launching a wide advertising campaign in the UK to promote a new app feature, aiming to increase installations by reaching a larger audience through bloggers. However, placing part of this audience in a control group, where they don't see the new feature, might create a negative impression.

To address this, a decision is made to roll out the feature for the entire Region B, while Region A serves as the control group without the brand campaign.

Our Goal

Compare the two groups, and find out whether the brand campaign has influenced the growth of installations in the region or not.

Dataset Description

Installs dynamic by dates in Region A: control group without a brand campaign.

Installs dynamic by dates in Region B: test group with the brand campaign.

Assumption: The regions used have a similar audience structure, and the control group must exhibit a reasonable correlation with test groups.

Solving the Problem

Step 1: Packages

First, install the CausalImpact package with pip install and execute it.

from causal_impact import CausalImpact

Step 2: Dataset Preparation

Load the dataset you want to analyze. Below is an example dataset with installs:

plaintextCopy code| date       | y_installs (Test Group) | x_installs (Control Group) |
|------------|--------------------------|----------------------------|
| 2023-01-01 | ...                      | ...                        |
| 2023-01-02 | ...                      | ...                        |
| ...        | ...                      | ...                        |

Step 3: Calculations and Graph

date_infer = '2023-01-18'  # Date of feature rollout
df.columns = ['y', 'x']
ci = CausalImpact(df, date_infer, n_seasons=7)
result = ci.run(max_iter=1000, return_df=True)
ci.plot()

Step 4: Reading Graph Results

Causal Impact produces several outputs, but two are especially useful: the graphs above and a summary of impact.

Plots:

Observation vs Predictions:

Blue line: Synthetic baseline created by the control group.
Black line: Actual performance of the test group.
Red dotted line: Estimated performance of the test group without intervention.
Grey outline: Bounds of a 95% confidence interval.
Vertical line: Represents the intervention date.

Difference and Cumulative Impact:

Plots the (cumulative/not) difference between the observed outcome and the predicted outcome.

Statistical significance is reached whenever the shaded area goes above or below the 0 lines within the 95% confidence interval.

Relative Uplift Calculation:

index_infer = np.where(np.array(df.index)==date_infer)[0][0]
post_infer_result = result[result.index>=index_infer]
rel_diff = post_infer_result.pred_diff/ post_infer_result.pred
print ('Relative uplift is {}%'.format(np.round(rel_diff.mean()*100, 1)))

This calculation helps represent the relative uplift, providing valuable insights into the effectiveness of the advertising campaign.

Conclusion

By comparing the test and control groups, we gained insights into whether the brand campaign influenced the growth of installations in the region.

The visualizations, especially the Observation vs Predictions and Difference and Cumulative Impact plots offer an understanding of the intervention's effects.

Armed with this hands-on experience, you are now equipped to leverage Causal Impact analysis in real-world scenarios, making informed decisions based on statistical significance and relative uplift. Cheers to unraveling insights and driving data-driven strategies!