Size Does Matter: Global Control Group for a Bank

The Task

A few years ago, I worked at the customer relationship management (CRM) department of a large retail bank as the team lead of data analytics. As a CRM team we were responsible for a variety of activities with the existing customer base: cross selling through different channels (e-mails, phone calls, SMS, etc.), notifications, product customization, and so on. Although we were always able to explain the quantity that each campaign brings (in quantitative metrics - extra sales, extra loan portfolio, etc.) because we had control groups for each campaign, there was a constant dispute about how much such activities bring to the bank in total.

Typical questions included:

Is it correct to sum all the incremental effects of all the campaigns we launch?
How can we synchronize campaigns that began at different time points?
How much would we underearn if we stop all our emails, SMS, and phone calls?

How We Answered Those Questions

To answer those questions, we proposed a so-called global control group (let's call it GCG here and below, too) - a group of clients that would be frozen for a year from any contacts from our side. So, we can freeze some clients from communications and not freeze the rest of the customer base and then compare the two groups based on a predefined measurement metric. Some examples of this metric could be the number of loans sold, the total loan portfolio per client, or the average profit per client.

The good news was that the management liked the idea. The bad news was that nobody wanted to stop selling extra products even to a fraction of clients: "Are you really going to stop selling to some group of clients? So, it will be a lost opportunity!". The discussion about the size of GCG instantly became a hard one.

During the dispute we agreed that, as an analytics team, we would calculate the right minimum sample size for the global control group from a statistical point of view.

How We Chose the Sample Size

Intuition says that the more clients we include in the GCG the stronger measuring power we have. But as discussed, it was hard to freeze too many clients from our sales activities. The technique explained below may be applied to select an ordinary control group (or in AB-testing). However, launching a cross sales campaign is typically not so strict in terms of minimizing the size of a control group as we have in the case of a GCG.

The Metric

In the example below, I will speak in terms of one metric - the fraction of clients who purchased extra cash loans (let's call it the target metric). Although there may be a variety of metrics, the general approach is very similar.

If you do not want to go into mathematical details, here is a short explanation of how we approached the issue:

First, we defined the incremental value of the target metric that is the minimum possible impact from a business point of view. For example, an extra 0.01 penetration of loans into our customer base may be considered the minimum feasible business effect of our activity. We just do not believe that our activity brings less.
Then, using probability theory, we calculated the minimum size for our GCG that ensures we don’t miss the minimum business value while performing statistical testing with at least 95% probability. In fact, 95% is a discussible value as it can be adjusted based on the problem at hand, but this still keeps the general approach the same.

This resulted in around 3% of the entire customer base as the proposed proportion for the GCG. If you are interested in only the business logic you may skip the following section with a detailed description of our statistical approach.

Detailed Statistical Approach

Definitions:

N - the number of clients in our customer base
alpha - the fraction of N that goes to the global control group (it is the parameter we are trying to define properly)
N1 - the number of clients in GCG = alpha*N
N2 - the number of clients in the rest of the customer base outside the GCG = (1 - alpha)*N
n1 - the number of clients from GCG who had an extra cash loan
n2 - the number of clients outside the GCG who had an extra cash loan
p1 = n1 / N1 - penetration of loans into GCG
p2 = n2 / N2 - penetration of loans into the rest of customer base
Obviously if p2 > p1 then our activity brings extra value p2 - p1 which of course is subject to statistical testing.

For statistical testing we use the following steps:

Calculate delta = p2 - p1 = n2/N2 - n1/N1
Test for its statistical significance:

To address the question of the GCG size, we use the following logic: what are the consequences of picking the wrong size for GCG? There are two possibilities:

We do not have a significant business impact, but we see a significant effect in p2 - p1 in our data (Type 1 error or False positive)
We have a significant business impact, but we see no statistical significance in n2/N2 - n1/N1 (Type 2 error or False negative)

In statistical testing we typically control the probability of False positives - Type 1 error by the significance level (for example 95%) of our test. So, when we have our GCG results, and the rest of the customer base results, we set the significance level of our t-test to be 95% (which means there is a 5% probability of Type 1 error).

This means that what we should care about is Type 2 error - probability of False negatives which is the value that varies when we change the size of our control group. In other words, in selecting the GCG size, we choose the probability of missing business effect while it exists. So, the issue is to understand how prob(False negative) depends on the fraction of our customer base (alpha) that we freeze from communications. The next step is to pick the alpha that gives us our target prob(False negative).

If we think in statistical terms, we test p2 against p1. p2 and p1 are obviously random values with asymptotically normal distributions N(m1, m1*(1-m1)/N1) and N(m2, m2*(1-m2)/N2) where m1 and m2 - unobservable true fractions of clients purchasing our extra cash loan (p1 and p2 - are simply estimates for m1 and m2). So, testing can be illustrated with the following picture:

Here, a false negative means that we have some difference m2-m1 but cannot identify it. The probability of that event is colored red in the image above. The intuition behind the impact of GCG size means that while increasing alpha, the distribution for p1 becomes more narrow (because a greater N1 reduces variance m1*(1-m1)/N1). Graphically speaking, the orange line moves to the left reducing Type 2 error. But, on the other hand, the distribution of p2 becomes a little wider (because lower N2 = (1-alpha)*N increases variance m2*(1-m2) / N2). Since alpha << 1 then a small increase in alpha is more likely to decrease Type 2 error as a result of these two opposite effects.

Attentive readers looking at the picture above may easily derive the exact formula for False negative probability as a function of alpha (taking into account asymptotically normal distributions) which is equal to the square of the red colored area in the picture. But we need to have assumptions about m1 and m2 (expectations of p1 and p2).

This is done simply through the following steps:

First, we set m1 that can be assumed as the bank’s business plan for the next year. Suppose that it equals to 0.1 (this means that we assume that the bank is going to sell extra cash loans to 10% of its customer base next year).
Then, we ask our CRM business team: what incremental value of the target metric do you consider to be the minimum feasible business effect of your activity? For example, they call it delta = 0.01 (which means that they believe that they can sell cash loans to at least an extra 1% of our customer base next year).
m2 is calculated simply by: m2 = m1 + delta

Having these assumptions we can calculate our False negative probability for different alpha values. In other words, we can calculate the probability of missing the minimum feasible business effect while it exists for different GCG sizes and then pick the size of GCG that gives us a good tradeoff between missing the effect and not turning off too many clients from our sales activities.

In our case we agreed to have alpha = 3% which resulted in less than 1% probability of False negatives.

Getting Things Together

Ok, now we know how to divide existing clients (we know the right alpha - the proportion of our customer base that should randomly go into the GCG). But what should we do with the new ones? Every day we have some new clients in our customer base. The answer is pretty simple: each new client is randomly put into GCG with an alpha probability, or into the rest of the customer base with 1-alpha probability.

The Results

As we discussed above, we managed to set the GCG size to be 3% of all the clients. After a year of collecting observation results, we found that every other cash loan sold to our clients was due to CRM activity. Which was an unexpected result as it was far better than we assumed it to be. Those interested in mathematical statistics may now carry out calculations to derive the lost selling opportunity of not selling extra products to GCG clients (3% of the whole customer base). But this is the price we pay for our knowledge about the incremental business value that we bring as a CRM team.