SERP Analysis with Google Search Console+Python

Written by dlite | Published 2019/06/25
Tech Story Tags: seo | data-science

TLDRvia the TL;DR App

SERP Analysis with Google Search Console+Python

Google makes billions and billions from paid search, but did you know those folks without adblockers are only responsible for about 6 percent of web visits? Long live free organic search, the primary diet for almost 2/3rds of all web traffic!

Via https://www.conductor.com/blog/2014/07/organic-search-actually-responsible-64-web-traffic/

If you’ve run a Google Adwords Campaign, you know how sophisticated(or complex) their tools are for analyzing paid traffic. By comparison, Google’s organic search analysis tools are stuck in the horse-and-buggy era of metrics. With so much of your traffic dependent on its appearance on a Search Engine Result Page (SERP), how can you better understand how your site is performing and changing?

In this post, I’ll show you how to quickly download your Google Search Console (GSC) data and perform analysis on it to improve your organic traffic.

The problem with SERP tools: synthetics

Search Google for “SERP Analysis” and you’ll see pages of tools, but there’s a big problem: almost all of these rely on manually running search queries on Google and scraping the results. However, as Google’s results have become more personalized, this kind of synthetic monitoring has become less accurate.

Additionally, Google doesn’t like this bot traffic. This is a reply from John Mueller, Webmaster Trends Analyst at Google on scraping results:

Google estimates that 70% of search traffic is from bots. That’s a lot of infrastructure spend to keep the bots alive — Google is massively incentified to make it harder for bots to cut costs.

What about Google Search Console?

Google provides its own SERP analysis: it’s called Google Search Console (GSC). The beauty of GSC is that 100% of its data comes from real searches, not synthetics.

While I like GSC’s dataset, it provides almost no guidance. Many sites have thousands of search queries that reference their site. It can be difficult knowing how to act on the data and very hard to identify trends in keywords outside the select few you’ve historically monitored. These are areas I’d love GSC to point me to:

  • Quick Wins — What keywords have an underperfoming Click Through Rate (CTR) based on their position in search results?
  • Top Performers — Which keywords have an overperforming CTR? These are good examples for the marketing team to copy as they are doing better than we’d expect.
  • New Keywords — What new significant keywords have appeared in the last 30 days? Automatically detect new keywords — often you don’t know how folks are finding your site.
  • Lost Keywords — What significant keywords have disappeared in the last two weeks? See above, then reverse.
  • Abnormal Behavior — Which significant keywords have abnormal variations in their key metrics (clicks, impressions, position, and CTR) in the last week? Fluctuations in these metrics can have a big impact on traffic. While GSC can compare time periods, it’s a lot of manual work to examine each metric across several time ranges.

Let’s do our own SERP analysis with Python

Don’t your breath for GSC to introduce the insights above. They make money off of paid search, not organic search. They are much more likely to improve their tools that support paid search than organic.

Thankfully, with a bit of API wrangling and some Python-driven data science, we can do our own SERP analysis. Let’s dig in!

Downloading Google Search Console Data

Ah, the machine learning real world: 95% data collection and cleanup, 5% the cool stuff everyone blogs about. Thankfully, this post from Moz walks through a script for downloading your GSC data to a collection of CSV files. The script just worked for me after installing the required Python dependencies. You’ll probably spend more time navigating Google’s Cloud Console than running the script.

stephan765/Google-Search-Console-bulk-query_A script to iterate through the available filters on Google Search Console, minimising sampling issues by extracting…_github.com

Loading the CSVs into a Pandas Dataframe

The Python script above creates one file per day with a row for each keyword. To begin our magic, we need to get the raw data into a Pandas Dataframe, which is basically a super-charged Google sheet that we interact with via Python. Simply paste the content of the gist below into a Jupyter Notebook cell, specifying the path to the folder containing the downloaded CSVs on line 44.

Exploring the dataset

To get our bearings, we’ll start with some high-level analysis.

Plot total clicks over time

Which generates a chart like:

We can see a seasonal pattern in search traffic with clicks decreasing over the weekend. Of course, you could see this same chart in GSC. Be patient — we’re just getting started!

Clicks, impressions, position, and CTR by query

Which generates a dataframe like:

Again, this is very similar to the views GSC provides. Don’t worry — things are about to get more interesting.

Plot Click Through Rate by Position

Which generates a chart like:

This chart is very similar to a power law curve. The CTR of the top position is 63%, then it steadies around 10% from the fifth position on. There’s a lot of value in getting into one of the top two positions.

Let’s move onto our analysis.

Quick Wins

Keywords that have many impressions but have a CTR in the lower 20% of your dataset are outliers — we’d expect them to be generating more clicks. Let’s identify these keywords:

Which creates a dataframe like:

Look for keywords that have:

  • A poor position (≥ 20) — this indicates a relatively popular search for your website.
  • A strong position (≤ 5) — Why aren’t folks clicking on this link? Check for accurate page titles and descriptions.

Top Performers

These keywords are the little engine that could — their average position is in the lower 60% but have a CTR greater than the median.

Users are working extra hard to click on these results. Think about how you can bump these up.

Lost Keywords

Knowing when you’ve disappeared from Google’s search results can point you to possible content issues on your site.

Abnormal Behavior

There’s too much noise in GSC — which can easily contain thousands of keywords — to notice when something isn’t right for specific keywords. Let’s do some data science.

To start, lets get a list of our top 10 queries by clicks. We care most about abnormal behavior in these traffic-driving queries:

Next, we’ll visualize abnormal impressions for each of our top queries by click using univariate anomaly detection (much code from this post by @attractchinese):

This will generate one plot for each of our top queries (so a total of ten plots). They should look like this:

Let’s see if there are any abnormal impressions per-day for our top queries over the past two weeks:

Which will crudely print:

Summary

Your organic traffic deserves extra attention because it’s likely your top source of traffic. Using Python, we can download real (not synthetic) data from Google Search Console and conduct our own SERP Analysis.


Written by dlite | Working on booklet.ai, co-founded Scout Server Monitoring (acq. 2017) & ScoutAPM (acq. 2018).
Published by HackerNoon on 2019/06/25