How to Implement First Touch Attribution with RudderStack

Written by shikhar-bhuddi | Published 2023/05/25
Tech Story Tags: rudderstack | attribution | customer-data-platform | customer-data-management | google-bigquery | data-warehouse | growth-marketing | founder-stories

TLDRAttribution can be an overwhelming problem to solve. Use a CDP like RudderStack, and employ its JS source to solve for your first touch attribution logic and customer journeys easily.via the TL;DR App

Everybody says “attribution is the holy grail of marketing” but how many growth teams actually go the extra mile to build a robust and accurate attribution model?

With an explosion of channels and devices, prospects are interacting with brands from varied sources, and customer acquisition channels do not talk to each other (unless you make them do so).

Therefore, GTM teams have two options – rely on their guts and assumptions of customer journeys or collect and model the touchpoints that a customer has with their brand. Either way, as GTM becomes more and more revenue-centric, the answer to what channels are working should be precisely defined, and should reflect the truth.

What is attribution and why is it important?

For the uninitiated (founders and non-marketing folks) – attribution in marketing refers to the process of identifying and assigning credit to the marketing channels or touchpoints that contributed to a desired customer action, such as making a purchase or completing a lead form. The goal of attribution is to understand which marketing efforts are most effective in driving conversions and to optimise marketing spend accordingly.

There are several different attribution models that marketers can use, including first-touch attribution, last-touch attribution, and multi-touch attribution. First-touch attribution gives credit to the first touchpoint a customer has with a brand, while last-touch attribution gives credit to the final touchpoint before a conversion. Multi-touch attribution assigns credit to all touchpoints along the customer journey.

Effective attribution requires the use of data analytics and marketing automation tools to track customer interactions across channels and measure the impact of each touchpoint on the customer journey.

The role CDPs can play in attribution modelling –

A customer data platform (CDP) is a centralized system that collects, integrates, and manages customer data from various sources to create a unified customer profile.

CDPs can integrate with various data sources, including website, apps, customer relationship management (CRM) systems, marketing automation platforms, and other third-party data sources. Additionally, they offer a marketer-friendly interface, allowing marketers to easily access and utilize customer data without relying on IT or technical resources.

Given that CDPs are continuously extracting, transferring and loading customer data to customer data warehouses, the destination warehouse becomes the source of truth of customer journeys.

The more channels CDP integrates, the better the view of customer journey that we will discover.

Hence, CDPs are central to attribution modelling. All modelling logics can be built in CDPs and can be visualized with a business intelligence tool of choice.

What is RudderStack and how does it work?

RudderStack is one of the up and coming CDPs, built with a warehouse first architecture. Its freemium offering is allowing organisations to set it up quickly, and experience the utility of CDPs without any purchase.

CDPs, including RudderStack, have 3 major elements –

  1. Sources – these are ways to extract/capture data from different platforms (third party cloud apps, your web and app properties, server integrations etc.).
  2. Destinations – these are data storage management systems or cloud apps that receive data from sources.
  3. Transformations – these are methods to transform and clean data before sending data to the destination.

Newer CDPs like RudderStack also have a feature called Reverse ETL that extracts data from destinations and sends it to other cloud applications/destinations. This is usually done to activate and utilize the collected data.

How do you use RudderStack for first touch attribution modelling?

One of the most interesting use-case of RudderStack that I have discovered is to use its Javascript SDK as a source of collecting page views and other interactions on the website properties, and building a customer journey out of it.

Here’s a step-by-step guide to visualizing the customer journey and identifying the first touch of any lead –

Step 1 – Sign up for RudderStack here.

Step 2 – Visit ‘Sources’ inside the RudderStack console, and click on ‘New Source’.

Step 3 – Select ‘Javascript’ from the grid of all the sources.

Step 4 – Name the source as ‘website_prod_test’ and then, copy the snippet that’s generated.

Step 5 – Install the javascript on your website using google tag manager or your native CMS custom script addition capability.

Step 6 – Visit your website and you’ll start to see events in the event flow within the source. Everytime a new website visitor arrives, RudderStack will provide an anonymous ID to it, and start tracking all page views on the website. The anonymous ID is stored in the cookieStore of the visitor’s browser, and even if the visitor visits the website across different sessions, it will still track page views against the same visitor.

Step 7 – Setup the Google BigQuery destination following the official instructions provided by RudderStack here.

Step 8 – Go to the sources tab, select your Javascript source, go to overview and connect your BigQuery destination. Once the destination is connected, the event stream from your JS source will start to flow into your Google BigQuery data warehouse destination.

Step 9 – Next, you need to identify your website visitors when they have filled a form on your website using Let’s say, if a visitor fills a form on your website, they land on a thank you page, and the email address of the visitor is added to the localStorage as an item. You need to execute the following code to identify a visitor and map the email address against the anonymous ID.

rudderanalytics.identify(
    rudderanalytics.getAnonymousId(), {
        email: localStorage.getItem(“email”),
    }
    },
    () => {
        console.log("identify call");
    }
);

Step 10 – Finally, when you have made the identify call as shown in the last step, the identity event will get passed to the Google BigQuery data warehouse, and you need to execute a SQL query as following, and you’ll get the first page visited by each identified visitor (i.e. leads who filled the form):

WITH
  added_row_number AS (
  SELECT
    p.anonymous_id AS `anonymous_id`,
    p.context_page_url AS `context_page_url`,
    p.referrer AS `referrer`,
    p.original_timestamp AS `timestamp`,
    ROW_NUMBER() OVER(PARTITION BY p.anonymous_id ORDER BY p.original_timestamp ASC) AS row_number
  FROM
    `website_prod_test.pages_view` p),
  journey AS (
  SELECT
    u.email,
    added_row_number.`anonymous_id` AS `id`,
    added_row_number.`context_page_url` AS `first_page_seen`,
    added_row_number.`referrer`,
    added_row_number.`timestamp`,
    added_row_number.row_number,
    CASE 
    WHEN CONTAINS_SUBSTR(added_row_number.`context_page_url`, "utm_medium=cpc") THEN added_row_number.`context_page_url`
    WHEN CONTAINS_SUBSTR(added_row_number.`referrer`, "utm_medium=cpc") THEN added_row_number.`referrer`
    WHEN CONTAINS_SUBSTR(added_row_number.`context_page_url`, "utm_medium=email") THEN added_row_number.`context_page_url`
    WHEN CONTAINS_SUBSTR(added_row_number.`referrer`, "utm_medium=email") THEN added_row_number.`referrer`
    ELSE added_row_number.`context_page_url`
    END
    AS `attribution_url`,
  FROM
    `wesbite_prod_test.users_view` u
  LEFT JOIN
    added_row_number
  ON
    u.id = added_row_number.anonymous_id)
SELECT
  *,
  REGEXP_EXTRACT(attribution_url, r".*[&?]utm_source=([^&]+).*") AS `utm_source`,
  REGEXP_EXTRACT(attribution_url, r".*[&?]utm_medium=([^&]+).*") AS `utm_medium`,
  REGEXP_EXTRACT(attribution_url, r".*[&?]utm_campaign=([^&]+).*") AS `utm_campaign`,
  REGEXP_EXTRACT(attribution_url, r".*[&?]utm_content=([^&]+).*") AS `utm_content`,
  REPLACE(REGEXP_EXTRACT(attribution_url, r".*[&?]utm_term=([^&]+).*"), "%20", " ") AS `utm_term`
FROM
  journey
WHERE
  journey.row_number = 1
  ORDER BY journey.timestamp DESC

Step 11 – Once you run this query, Google BigQuery interface will give you an option to visualize the result using Looker studio. You can follow our guide to Looker Studio for marketers or refer to Looker studio documentation to quickly generate a dashboard as shown below. It will show you the journey taken by an individual along with the timestamp and the referrer.

This was a quick, super-specific method of building a low cost first touch attribution model using RudderStack. There’s so much more to RudderStack, CDPs and attribution modelling. If you’re interested in implementing a CDP for your SaaS product, reach out to me.

This article was originally posted here.


Written by shikhar-bhuddi | Blending Technology & Marketing together to solve critical issues of growth!
Published by HackerNoon on 2023/05/25