How to Scrape Data from Google Maps

Written by darshan12 | Published 2022/11/20
Tech Story Tags: javascript | beginners | tutorial | programming | nodejs | data-analytics | big-data | web-scraping

TLDRGoogle Maps Data is a crucial piece of data for review software companies, and data miners as it contains user ratings, user reviews, addresses, and images of a particular place. In this tutorial, we will learn how to scrape this valuable Google Maps information with the help of Node JS. via the TL;DR App

Introduction

Google Maps Data is a crucial piece of data for review software companies, sentimental analysts, and data miners as it contains information like user ratings, user reviews, addresses, and images of a particular place.

In this tutorial, we will learn how to scrape this valuable Google Maps information with the help of Node JS. And at the end, we will see how Serpdog's | Google Maps Reviews API can help you scrape Google Maps Reviews without any extra effort, which we require in scraping Google.

Requirements:

Web Parsing with CSS selectors

Searching the tags from the HTML files is not only a difficult thing to do but also a time-consuming process. It is better to use the CSS Selectors Gadget for selecting the perfect tags to make your web scraping journey easier.

This gadget can help you to come up with the perfect CSS selector for your need. Here is the link to the tutorial, which will teach you to use this gadget for selecting the best CSS selectors according to your needs.

User Agents

User-Agent is used to identify the application, operating system, vendor, and version of the requesting user agent, which can save help in making a fake visit to Google by acting as a real user.

You can also rotate User Agents, read more about this in this article: How to fake and rotate User Agents using Python 3.

If you want to further safeguard your IP from being blocked by Google, you can try these 10 Tips to avoid getting Blocked while Scraping Websites.

Install Libraries

Before we begin, install these libraries so we can move forward and prepare our scraper.

  1. Puppeteer JS

Or you can type the below commands in your project terminal to install the libraries:

npm i puppeteer

Target:

Process:

Copy the below target URL to extract the HTML data:

https://www.google.com/maps/search/coffee/@28.6559457,77.1404218,11z

Coffee is our query. After that, we have our latitudes and longitudes. The number before z at the end is nothing but the zooming intensity of Google Maps. You can decrease or increase it as per your choice. Its value ranges from 2.92, in which the map completely zooms out, to 21, in which the map completely zooms in.

Note: Latitudes and longitudes are required to pass in the URL. But the zoom parameter is optional.

We will use Puppeteer Infinite Scrolling Method to scrape the Google Maps Results. So, let us start preparing our scraper.

First, let us create the main function, which will launch the browser and navigate to the target URL.

const getMapsData = async () => {     
        browser = await puppeteer.launch({
            headless: false,
            args: ["--disabled-setuid-sandbox", "--no-sandbox"],
        });
        const page = await browser.newPage();
        await page.setExtraHTTPHeaders({
            "User-Agent":
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_10) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4882.194 Safari/537.36",
        })
        
        
        await page.goto("https://www.google.com/maps/search/Starbucks/@26.8484046,75.7215344,12z/data=!3m1!4b1" , {
            waitUntil: 'domcontentloaded',
            timeout: 60000
        })
        
        await page.waitForTimeout(3000);
        
        let data =  await scrollPage(page,".m6QErb[aria-label]",2)
        
        console.log(data)
        await browser.close();
        };

Step-by-step explanation:

  1. puppeteer.launch() - This will launch the Chromium browser with the options we have set in our code. In our case, we are launching our browser in non-headless mode.

  2. browser.newPage() - This will open a new page or tab in the browser.

  3. page.setExtraHTTPHeaders() - It is used to pass HTTP headers with every request the page initiates.

  4. page.goto() - This will navigate the page to the specified target URL.

  5. page.waitForTimeout() - It will cause the page to wait for 3 seconds to do further operations.

  6. scrollPage() - At last, we called our infinite scroller to extract the data we need with the page, the tag for the scroller div, and the number of items we want as parameters.

Now, let us prepare the infinite scroller.

  const scrollPage = async(page, scrollContainer, itemTargetCount) => {
        let items = [];
        let previousHeight = await page.evaluate(`document.querySelector("${scrollContainer}").scrollHeight`);
        while (itemTargetCount > items.length) {
            items = await extractItems(page);
            await page.evaluate(`document.querySelector("${scrollContainer}").scrollTo(0, document.querySelector("${scrollContainer}").scrollHeight)`);
            await page.evaluate(`document.querySelector("${scrollContainer}").scrollHeight > ${previousHeight}`);
            await page.waitForTimeout(2000);
        }
        return items;
    }

Step-by-step explanation:

  1. previousHeight - Scroll the height of the container.
  2. extractItems() - Function to parse the scraped HTML.
  3. In the next step, we just scrolled down the container to a height equal to previousHeight.
  4. And in the last step, we waited for the container to scroll down until its height got greater than the previous height.

And, at last, we will talk about our parser.


const extractItems = async(page)  => {
        let maps_data = await page.evaluate(() => {
        return Array.from(document.querySelectorAll(".Nv2PK")).map((el) => {
            const link = el.querySelector("a.hfpxzc").getAttribute("href");
            return {
            title: el.querySelector(".qBF1Pd")?.textContent.trim(),
            avg_rating: el.querySelector(".MW4etd")?.textContent.trim(),
            reviews: el.querySelector(".UY7F9")?.textContent.replace("(", "").replace(")", "").trim(),
            address: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(1) > span:last-child")?.textContent.replaceAll("·", "").trim(),
            description: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(2)")?.textContent.replace("·", "").trim(),
            website: el.querySelector("a.lcr4fd")?.getAttribute("href"),
            category: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(1) > span:first-child")?.textContent.replaceAll("·", "").trim(),
            timings: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(3) > span:first-child")?.textContent.replaceAll("·", "").trim(),
            phone_num: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(3) > span:last-child")?.textContent.replaceAll("·", "").trim(),
            extra_services: el.querySelector(".qty3Ue")?.textContent.replaceAll("·", "").replaceAll("  ", " ").trim(),
            latitude: link.split("!8m2!3d")[1].split("!4d")[0],
            longitude: link.split("!4d")[1].split("!16s")[0],
            link,
            dataId: link.split("1s")[1].split("!8m")[0],
            };
        });
        });
        return maps_data;
        }    

Step-by-step explanation:

  1. document.querySelectorAll() - It will return all the elements that match the specified CSS selector. In our case, it is Nv2PK.
  2. getAttribute() -This will return the attribute value of the specified element.
  3. textContent - It returns the text content inside the selected HTML element.
  4. split() - Used to split a string into substrings with the help of a specified separator and return them as an array.
  5. trim() - Removes the spaces from the starting and end of the string.
  6. replaceAll() - Replaces the specified pattern from the whole string.

Here is the full code:

const puppeteer = require('puppeteer');

    const extractItems = async(page)  => {
        let maps_data = await page.evaluate(() => {
        return Array.from(document.querySelectorAll(".Nv2PK")).map((el) => {
            const link = el.querySelector("a.hfpxzc").getAttribute("href");
            return {
            title: el.querySelector(".qBF1Pd")?.textContent.trim(),
            avg_rating: el.querySelector(".MW4etd")?.textContent.trim(),
            reviews: el.querySelector(".UY7F9")?.textContent.replace("(", "").replace(")", "").trim(),
            address: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(1) > span:last-child")?.textContent.replaceAll("·", "").trim(),
            description: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(2)")?.textContent.replace("·", "").trim(),
            website: el.querySelector("a.lcr4fd")?.getAttribute("href"),
            category: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(1) > span:first-child")?.textContent.replaceAll("·", "").trim(),
            timings: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(3) > span:first-child")?.textContent.replaceAll("·", "").trim(),
            phone_num: el.querySelector(".W4Efsd:last-child > .W4Efsd:nth-of-type(3) > span:last-child")?.textContent.replaceAll("·", "").trim(),
            extra_services: el.querySelector(".qty3Ue")?.textContent.replaceAll("·", "").replaceAll("  ", " ").trim(),
            latitude: link.split("!8m2!3d")[1].split("!4d")[0],
            longitude: link.split("!4d")[1].split("!16s")[0],
            link,
            dataId: link.split("1s")[1].split("!8m")[0],
            };
        });
        });
        return maps_data;
        }
    
        const scrollPage = async(page, scrollContainer, itemTargetCount) => {
        let items = [];
        let previousHeight = await page.evaluate(`document.querySelector("${scrollContainer}").scrollHeight`);
        while (itemTargetCount > items.length) {
            items = await extractItems(page);
            await page.evaluate(`document.querySelector("${scrollContainer}").scrollTo(0, document.querySelector("${scrollContainer}").scrollHeight)`);
            await page.evaluate(`document.querySelector("${scrollContainer}").scrollHeight > ${previousHeight}`);
            await page.waitForTimeout(2000);
        }
        return items;
        }
    
    
    
    const getMapsData = async () => {
        browser = await puppeteer.launch({
        headless: false,
        args: ["--disabled-setuid-sandbox", "--no-sandbox"],
        });
        const [page] = await browser.pages();
        await page.setExtraHTTPHeaders({
            "User-Agent":
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_10) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4882.194 Safari/537.36",
        })
        
    
        await page.goto("https://www.google.com/maps/search/Starbucks/@26.8484046,75.7215344,12z/data=!3m1!4b1" , {
            waitUntil: 'domcontentloaded',
            timeout: 60000
        })
    
        await page.waitForTimeout(5000)  
    
    let data =  await scrollPage(page,".m6QErb[aria-label]",2)
    
    console.log(data)
    await browser.close();
    };
    
    getMapsData(); 

Results:

Our result should look like this 👇🏻:

 [
    {
        title: 'The Coffee Bean & Tea Leaf',
        avg_rating: '4.7',
        reviews: '79',
        address: 'The Coffee Bean & Tea Lea,Ground Floor, Epicuria Food Court, Plot No-10 Shivaji Place, Najafgarh Rd',
        description: 'Chain coffee bar known for frozen drinks',
        category: 'Coffee shop',
        timings: 'Open ⋅ Closes 11PM',
        phone_num: 'Open ⋅ Closes 11PM',
        extra_services: 'Dine-in   Drive-through   No-contact delivery      Reserve a table',
        latitude: '28.6511983',
        longitude: '77.1215014',
        link: 'https://www.google.com/maps/place/The+Coffee+Bean+%26+Tea+Leaf/data=!4m7!3m6!1s0x390ce3a69997ad37:0xff83fd9a57a7a71e!8m2!3d28.6511983!4d77.1215014!16s%2Fg%2F11sgxr14tq!19sChIJN62XmabjDDkRHqenV5r9g_8?authuser=0&hl=en&rclk=1',
        dataId: '0x390ce3a69997ad37:0xff83fd9a57a7a71e'
    },
    {
        title: 'The Coffee Bean & Tea Leaf',
        avg_rating: '4.0',
        reviews: '271',
        address: 'T320, Ambience Mall, Gurgaon - Delhi Expy',
        description: 'Chain coffee bar known for frozen drinks',
        category: 'Coffee shop',
        timings: 'Open ⋅ Closes 11PM',
        phone_num: 'Open ⋅ Closes 11PM',
        extra_services: 'Dine-in   Takeaway   No-contact delivery',
        latitude: '28.5041789',
        longitude: '77.0970538',
        link: 'https://www.google.com/maps/place/The+Coffee+Bean+%26+Tea+Leaf/data=!4m7!3m6!1s0x390d194c1a223247:0x611f25bf4fddaf08!8m2!3d28.5041789!4d77.0970538!16s%2Fg%2F11cs6ch67r!19sChIJRzIiGkwZDTkRCK_dT78lH2E?authuser=0&hl=en&rclk=1',
        dataId: '0x390d194c1a223247:0x611f25bf4fddaf08'
    },
    .....                             
           

Serpdog's Google Maps Reviews API

Scraping, in the long run, can become a time-consuming process as it requires you to maintain the scraper according to changing CSS Selectors. To solve this problem, we at Serpdog | Google Search API also offer Google Maps Reviews API that returns the HTML and readymade structured JSON data to the users. Currently, we are working on Google Maps API, which we will launch after some time.

Scraping Google also requires solving captchas, a large pool of User agents, and proxies, but Serpdog solves all these problems on its behalf for a smooth scraping experience.

Our users also get 100 free requests on the first sign-up.

     const axios = require('axios');

    axios.get('https://api.serpdog.io/reviews?api_key=APIKEY&data_id=0x89c25090129c363d:0x40c6a5770d25022b')
        .then(response => {
        console.log(response.data);
        })
        .catch(error => {
        console.log(error);
        });

Results:

"location_info": {
    "title": "Statue of Liberty",
    "address": "New York, NY",
    "avgRating": "4.7",
    "totalReviews": "83,109 reviews"
    },
    "reviews": [
    {
        "user": {
        "name": "Vo Kien Thanh",
        "link": "https://www.google.com/maps/contrib/106465550436934504158?hl=en-US&sa=X&ved=2ahUKEwj7zY_J4cv4AhUID0QIHZCtC0cQvvQBegQIARAZ",
        "thumbnail": "https://lh3.googleusercontent.com/a/AATXAJxv5_uPnmyIeoARlf7gMWCduHV1cNI20UnwPicE=s40-c-c0x00000000-cc-rp-mo-ba4-br100",
        "localGuide": true,
        "reviews": "111",
        "photos": "329"
        },
        "rating": "Rated 5.0 out of 5,",
        "duration": "5 months ago",
        "snippet": "The icon of the U.S. 🗽🇺🇸. This is a must-see for everyone who visits New York City, you would never want to miss it.There’s only one cruise line that is allowed to enter the Liberty Island and Ellis Island, which is Statue Cruises. You can purchase tickets at the Battery Park but I’d recommend you purchase it in advance. For $23/adult it’s actually very reasonably priced. Make sure you go early because you will have to go through security at the port. Also take a look at the departure schedule available on the website to plan your trip accordingly.As for the Statue of Liberty, it was my first time seeing it in person so what I could say was a wow. It was absolutely amazing to see this monument. I also purchased the pedestal access so it was pretty cool to see the inside of the statue. They’re not doing the Crown Access due to Covid-19 concerns, but I hope it will be resumed soon.There are a gift shop, a cafeteria and a museum on the island. I would say it takes around 2-3 hours to do everything here because you would want to take as many photos as possible.I absolutely loved it here and I can’t wait to come back.The icon of the U.S. 🗽🇺🇸. This is a must-see for everyone who visits New York City, you would never want to miss it. …More",
        "visited": "",
        "likes": "91",
        "images": [
        "https://lh5.googleusercontent.com/p/AF1QipPOBhJtq17DAc9_ZTBnN2X4Nn-EwIEet61Y9JQo=w100-h100-p-n-k-no",
        "https://lh5.googleusercontent.com/p/AF1QipPZ2ut1I7LnECqEB2vzrBk-PSXzBxaHEE4S54lk=w100-h100-p-n-k-no",
        "https://lh5.googleusercontent.com/p/AF1QipM8nIogBhwcL-dUrd7KaIxZcc_SA6YnJpp50R0C=w100-h100-p-n-k-no",
        "https://lh5.googleusercontent.com/p/AF1QipPQ-YP7uw_gHTNb1gGZSGRGRrzLMzOrvh98AmSN=w100-h100-p-n-k-no",
        "https://lh5.googleusercontent.com/p/AF1QipOTqBzK30vQZi9lfuhpk5329bnx-twzgIVjwcI1=w100-h100-p-n-k-no",
        "https://lh5.googleusercontent.com/p/AF1QipN0TWUE35ajoTdSKelspuUpK-ZTXlRRR9SfPbTa=w100-h100-p-n-k-no",
        "https://lh5.googleusercontent.com/p/AF1QipPQH_4HtdXmSdkCiDTv2jO30LksCxpe9KQI4YKw=w100-h100-p-n-k-no",
        "https://lh5.googleusercontent.com/p/AF1QipN_OfX2TgXVNry5fli5v-yExbyTAfV4K7SEy3T0=w100-h100-p-n-k-no",
        "https://lh5.googleusercontent.com/p/AF1QipNWKl0TeBmnzMaR_W4-7skitDwHjjJxPePbiSyd=w100-h100-p-n-k-no"
        ]
    },
        ........

Conclusion:

This tutorial taught us to scrape Google Maps Results using Node JS. Feel free to message me if I missed something. Follow me on Twitter. Thanks for reading!

Additional Resources

  1. Scrape Google Organic Search Results

  2. Web Scraping Google Images

  3. Scrape Google News Results

  4. Web Scraping Google With Node JS - A Complete Guide

Originally published here.


Written by darshan12 | Founder at serpdog.io
Published by HackerNoon on 2022/11/20