Getting into Puppeteer : Inject | Interact | Keys | Capture

At the end of this article we will learn how to

Inject custom Javascript functions into another page’s context.
Interact with forms to automate things.
Capture screenshots of particular elements from the page
Using keyboard/mouse events to type/click selected elements.
Capture screenshots
Selecting elements from the DOM.
Accessing cookies

Introduction

Headless browsers are really making some good contribution in browser automations and testing areas. Most of which are used for unit/end to end testing, while few are just perfect for browser automation. Invisible browsers or technically known as headless browsers are those which have all the functionality/features of a normal browser but are usually executed through command line/code. Mostly used for testing but can also be used for web scraping, screenshots capturing, injecting scripts to automate the interaction on the websites.

In this article we will be going through newly launched Google chrome node api Puppeteer by Google devtools team. Along with puppeteer there are some other that are recently released including Chromeless, Chrominator, Chromy, Navalia, Lambdium , nightmare js (which is similar to Puppeteer but uses electron behind the scenes while Puppeteer is solely built on chromium) and the good old phantomjs which provides solid cross browser api for testing.

About puppeteer

“Puppeteer is a Node library which provides high-level API to control headless Chrome over the devtools protocol. It can also be configured to use in full ( or non-headless) Chrome “ is what Google dev team says.

Puppeteer can be used to

Automate visual testing.
Generating images/pdf screenshots of websites without opening the browser.
Form submission / Browser usage simulation
Scraping websites
or to capture the timeline

and many more.

Lets get started with the basic usage of puppeteer.

You can also give it a try at the new Puppeteer’s playground https://try-puppeteer.appspot.com/

Installing puppeteer is not tricky at all, I will walk you through the quick setup,

On a Ubuntu machine

sudo apt-get update

Installing node v8.4.0 from nodesource. Find yours at https://github.com/nodesource/distributions

# This installs nodejs on ubuntu based system

curl -sL https://deb.nodesource.com/setup_8.x | sudo -E bash -

sudo apt-get install -y nodejs

along with that you need to have these too

gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget

On Ubuntu

sudo apt-get install gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wgetsudo apt-get install gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget

Install npm

sudo apt-get install npm

Now into your working directory run

npm init

npm -i puppeteer

and you are good to go. Running the above command it installs recent version of Chromium and puppeteer. Remember there is no compulsion of having a display because it is a headless browser it can run on servers/only command line. If at all you run into some problems do visit this link for troubleshooting https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md and if you are on ubuntu xenial no problems at all, follow my lead https://github.com/GoogleChrome/puppeteer/issues/290#issuecomment-324838511

Getting started

Lets generate a screenshot of a website in iPad pro dimensions. (768px * 1024px) ( as per http://screensiz.es/ ).

After installing puppeteer open a .js file lets call it screenshot.js file in the npm project folder and add the below code to it..

const puppeteer = require('puppeteer');(async () => {const browser = await puppeteer.launch();const page = await browser.newPage();await page.goto('https://google.com');await page.setViewport({width : 768 , height : 1024});await page.screenshot({path: 'google.png'});browser.close();})();

now run the script in terminal

node screenshot.js

after the code successfully runs you will have a screenshot of google.com and with one line of code you can even save it as a pdf. Add the below code before browser.close()

await page.pdf({path: 'google.pdf', format: 'A4'});

Other than A4 you can even capture other sizes Letter, Legal, Tabloid, Ledger, A0, A1, A2, A3, A4, A5. More about it here

Lets use the screenshot feature in slightly different manner. Recently while I was scrolling through my feed on twitter I saw an user tweet on some X company that provides screen capture as a service for a new feature. What he wanted was , when you pass an element’s ID or class that element must be captured instead of the whole page. Just the element. So I thought of implementing it here.

To do this. We will have to find the position and dimensions of that element. Using these details we can clip the screenshot .In puppeteer you can clip the particular area of a screen shot. More about this api’s method here https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagescreenshotoptions

We will use $eval method of page api to achieve this. it accepts a selector and function to be executed in the page’s context as parameters. the offset height and width gives the dimensions of the element and the offsetTop and offsetLeft gives the position which are needed to clip the area.

Below script captures the Smashing magazine’s sidebar.

const puppeteer = require('puppeteer');

(async () => {const browser = await puppeteer.launch();const page = await browser.newPage();await page.goto('https://www.smashingmagazine.com/');page.setViewport({width : 1400 , height : 2500})const html = await page.$eval('.sn', e => [e.offsetWidth,e.offsetHeight,e.offsetTop,e.offsetLeft] );await page.screenshot({path : "clip.png", clip : {x : html[2],y : html[3] , width : html[0] , height : html[1]}});browser.close();})();

The image on the left is the sidebar of SM website. These tweaks are particularly helpful in many situations . It can be improvised and implemented in your own way for you need.

Now lets move on to interacting with a form

Interacting with the webpage/form

I consider this to be the most powerful part of Headless browsers. Interacting with the page as though it is opened in a browser.

Lets capture a screen shot of the big brother https://google.com

The screenshot method of Page api does the job for us.

Checkpoint #1 : Now lets start typing in it without opening the browser, how cool is it!!!. Before you type you have to focus on the input element as you would if you were using a browser. To do that we have to focus on the input element. Before that lets find the element to focus on

Checkpoint #2 : Right click on the input searchbar and select inspect element.( To learn more about using devtools inspection visit here https://developer.chrome.com/devtools ). Now select the id of that element, btw its “lst-ib” and I don’t know what lst-ib means anyways, page.click(), page.type(), page.tap() are few of the interaction handling functions among many others. Focus on the element by clicking on it using page.click(element) and then start typing by using page.type(“pass what needs to be typed”) and then capture the screenshot.

const puppeteer = require('puppeteer');(async () => {const browser = await puppeteer.launch();const page = await browser.newPage();await page.goto('https://google.com');await page.screenshot({path : "google.png"});page.click("#lst-ib");page.type("Smashing magazine");

before v0.12.0

page.click("#lst-ib");page.type("Smashing magazine");

after v0.12.0 as per suggestion by https://medium.com/@dan_kim

page.type("#lst-ib","Smashing magazine");

await page.screenshot({path : "google1.png"});browser.close();})();

Checkpoint #3 : Now we have to click the submit or press the enter key. Lets explore both the ways and you will get to know how flexible the api is even to handle key presses.

Now that we have typed , lets click the “Google search” button. Because it doesn’t have any id specified (element inspection) we have to use any other selectors You can try this in the console at google’s homepage

document.querySelectorAll('input[type="submit"]')[0]

gives you the submit input element.

Now in puppeteer we have to click the button and wait for it to navigate which can be done with page.waitForNavigation(); and then capture screenshot again.

const puppeteer = require('puppeteer');

(async () => {

  const browser = await puppeteer.launch();  const page = await browser.newPage();  await page.goto('https://google.com');  await page.screenshot({path : "google.png"});  page.click("#lst-ib");  page.type("Smashing magazine");  await page.screenshot({path : "google1.png"});  page.click('input[type="submit"]');  // Note although the page has two input[type="submit"] elements it picks up the   first one more about it here.  await page.waitForNavigation();  await page.screenshot({path : "google2.png"});  browser.close();

})();

Now lets try the enter key search way. After the checkpoint #2 notice the use of keyboard class’s method down().

(async () => {    const browser = await puppeteer.launch();    const page = await browser.newPage();    await page.goto('https://google.com');    await page.screenshot({path : "google.png"});    page.click("#lst-ib"); //Focus on input element    page.type("Medium top articles");    await page.screenshot({path : "google1.png"});    page.keyboard.down('Enter'); // Press enter    await page.waitForNavigation();    await page.screenshot({path : "google2.png"});    browser.close();})();

Till now we have seen how to select elements using selectors and puppeteer to interact with form fields, simulating mouse and keyboard events.

Running custom javascript functions inside pages context.

Lets execute a function inside the page’s context and change the google’s logo to smashing magazine’s.

By inspecting the google’s logo in google.com which has ‘hplogo’ as the id. We will be using the puppeteer’s page api’s eval function which takes in the selector and the function along with its arguments as its arguments, evaluates the passed function inisde the page’s context in this case google.com

const puppeteer = require('puppeteer');

const sleep = require('sleep');

// Not necessary, becuase we are loading the smahingmagazine's logo it takes time to load the image. Hence if we wait its for a while the image will be loaded and then a screenshot can be taken.

(async () => {const browser = await puppeteer.launch();const page = await browser.newPage();await page.goto('https://google.com');const html = await page.$eval('#hplogo', e => e.children[0].innerHTML = '<img src="https://media-mediatemple.netdna-ssl.com/wp-content/themes/smashing-magazine/assets/images/logo.svg">');

sleep.sleep(2); // wait for 2 seconds so that the SM logo loads completely.

await page.screenshot({path : "jack.png"}); //capture the screenshot

browser.close();})();

Misc

Sometimes you will have to see what the browser is displaying, so if you want to run it headless

const browser = await.puppeteer.launch({headless : false});

This will run the script as well as launch the browser.

Lets access the cookies stored by google.com

page.cookies class will give you access to the page’s cookies.

(async () => {

const browser = await puppeteer.launch();const page = await browser.newPage();await page.goto('https://google.com');var c = await page.cookies();console.log(c); // outputs the cookiebrowser.close();

})();

Output on my box.

[{name: '1P_JAR',value: '2017-9-12-13',domain: '.google.co.in',path: '/',expires: 1505827402,size: 18,httpOnly: false,secure: false,session: false},{name: 'NID',value: 'I have erased the cookies value. Just incase',domain: '.google.co.in',path: '/',expires: 1521033802.002931,size: 135,httpOnly: true,secure: false,session: false}]

Recently many headless browsers are in space like Chromeless : https://github.com/graphcool/chromeless Chrominator : https://github.com/jesg/chrominator. and many others.

Useful links

https://github.com/GoogleChrome/puppeteer/

Chrome Headless doesn't launch on Debian · Issue #290 · GoogleChrome/puppeteer_Running this example code from the README: const puppeteer = require('puppeteer'); (async() => { const…_github.com

Getting into Puppeteer : Inject | Interact | Keys | Capture | Select