Efficiently snapshotting your single-page-apps with Puppeteer

Written by cheapsteak | Published 2018/02/24
Tech Story Tags: puppeteer | seo | social-media-marketing | digitalocean | vps

TLDRvia the TL;DR App

Running a screenshot service on a $5-per-month VPS

Motivations

My hobby project — npmcharts, is a single-page app that shows the download trends of various npm packages. If you went looking into which headless chrome library to use, you’ll see this graph -

However, when that page is shared to Facebook, Twitter or Slack, the preview image that shows up would all be the same — the screenshot I took comparing frontend frameworks and uploaded as the site’s sole Open Graph image two years ago. Tsk tsk.

The problem

Everything was hosted on my little $5 Digital Ocean droplet. One of the reasons I’d put this feature off for so long was because I was worried the droplet wouldn’t be able to handle the load of running a snapshotting service. I’ll have to try to be efficient.

Starting out simple

Let’s start out simple and first get something that works:

This function is called by an express route handler for /chart-image

Fairly straightforward, but launching a browser instance for each request and closing it afterward seems wasteful.

The average time to return a screenshots is currently ~3.5 seconds on my MacBook, it’ll only take longer on the D.O. droplet. Let’s reduce and reuse.

Note: All timings given are measured against local servers running on my MacBook. It only accounts for execution speed of the function to return an image on a 2015 MBP. Network transfer speeds are not included!

Pooling browsers

Pooling would allow us to keep a handful of browser instances open and reuse them for each screenshot request. Puppeteer doesn’t come with built-in pooling solutions, but there are a few generic libraries available. We’ll be using [generic-pool](https://www.npmjs.com/package/generic-pool).

The first pool:

But wait, we shouldn’t need to create pages and set viewport for each screenshot either. Let’s pool pages instead of browsers:

Let’s update our getChartImage function to ask for pages from the pool:

The average time of subsequent screenshots is now down to ~1.58 seconds! (For those curious, when just pooling browsers without pages, the average time was ~1.87 seconds)

Update: Michael J. Ryan from echojs keenly pointed out that you can go further and have pool the pages against a single browser. “Start time of browser instances will be reduced. Chrome creates a separate management/runtime process for each tab/page!” Here’s a link to an even more efficient gist to getBrowserPool

Take advantage of “single-paged-ness”

One of the advantages of SPAs is that browsers don’t have to reload all resources and re-parse all the scripts upon navigation. However, by calling page.goto each time, we were unnecessarily triggering full page reloads when navigating within the same app.

The solution for this varies depending on the framework and routing library the app uses, but the basic idea is fairly simple and translatable —

  1. On the frontend, expose the routing function that would allow route navigation to be triggered by puppeteer from the global context (i.e. window)
  2. Also on the frontend, make a flag available to let puppeteer know when the route transition is complete.
  3. Puppeteer would flip the flag to false, call the routing function, and poll the flag’s value until the front-end flips it to true (signifying that the route change is complete).

If your app uses React and React-Router v4, you could use withRouter somewhere in the app to ask for the history object, then stick that into the window. e.g.

In my case with Vue 1.0 and vue-router 0.7, I added this line to the root component’s ready hook:

After the route transition has completed (data loading and rendering is done), the frontend would flip the flag to signal back that it’s ready to have its screenshot taken:

Let’s update getChartImage to use those hooks

A screenshot now only takes ~860ms! We’ve managed to shave the time down to less than a quarter of the initial implementation.

There’s one more thing we can do —

Memoize

We’d save more resources if we didn’t have to generate these images every time they’re accessed.

When a request for a screenshot comes in, we want to —

  1. Check if a screenshot for that resource already exists.
  2. If it does, check if it’s stale and needs to be updated.
  3. If it exists and is not stale, directly return that file.
  4. If it doesn’t exist or needs to be updated, create a new snapshot.

And that’s it! Subsequent requests within a certain time period only takes 0.3ms. The next step would be to save and serve it up from a CDN instead of the local filesystem, but I think this is good enough for now :) Digital Ocean’s droplet comes with 25 gigs of SSD and 1TB transfer, will save that for when I need it.

Hope this was helpful! Thanks so much to Ben Hare and Jeffrey Burt for their feedback on drafts, and Michael J. Ryan for suggestions and corrections!

And please come checkout my site npmcharts.com for all your npm package comparison needs! Here’s one of webpack, browserify, rollup, and parcel: —

Project source available on Github. Thanks to @osdevisnot for pointing out the previous chart included “parcel”, which should have been “parcel-bundler” instead!

📬Subscribe to my newsletter to receive upcoming articles in your inbox

🐥Tweet at me @Cheapsteak

🦈 Digital Ocean $10 referral link

💼Come work with me at Prodigy Game! We’re looking for Senior Full Stack Developers and Backend Developers for our Toronto office :)


Published by HackerNoon on 2018/02/24