Using Yarn with Docker

Facebook recently released Yarn, a new Node.js package manager built on top of the npm registry, massively reducing install times and shipping a deterministic build out of the box.

Determinism has always been a problem with npm, and solutions like npm shrinkwrap are not working well. This makes hard to use a npm-based system for multiple developers and on continuous integration. Also, npm slowness in case of complex package.json files causes long build times, representing a serious blocker when using Docker for local development.

This article discuss how to use Yarn with Docker for Node.js development and deployment.

xkcd take on installing code

TL;DR

Clone the boilerplate:

git clone https://github.com/mfornasa/DockerYarn.git

Enter the directory:

cd DockerYarn

Build the container:

./build.sh

Run it:

docker run yarn-demo node -e "console.log('Hello, World')"

The first time your build the container, Yarn fetches npm dependencies for you. After that, Yarn is executed only when you modify your package.json, and it uses cache from previous executions. On top of it, you have determinism: the same dependency tree is installed every time and on every machine. And it’s blazing fast!

Let’s get started

The procedure works on Mac and Linux. We are going to the Risingstack Node.js Docker image for Node 6. Please install Yarn on your machine before proceeding.

Download Yarn installation package in a local folder:

wget https://yarnpkg.com/latest.tar.gz

Create a new Dockerfile:

FROM risingstack/alpine:3.4-v6.7.0-4.0.0

WORKDIR /opt/app

# Install yarn from the local .tgzRUN mkdir -p /optADD latest.tar.gz /opt/RUN mv /opt/dist /opt/yarnENV PATH "$PATH:/opt/yarn/bin"

# Install packages using YarnADD package.json /tmp/package.jsonRUN cd /tmp && yarnRUN mkdir -p /opt/app && cd /opt/app && ln -s /tmp/node_modules

This is based on a well-known trick to make use of Docker layer caching to avoid to reinstall all your modules each time you build the container. In this way, Yarn is executed only when you change **package.json** (and the first time, of course).

Init package.json

yarn init

Add your first package:

yarn add react

Build and run your new container:

docker build . -t yarn-demodocker run yarn-demo node -e "console.log('Hello, World')"

Congratulations! You’re using yarn with Docker.

Wait! What about "`yarn.lock”`?

Yarn stores the exact version of each package and sub-package in order to be able to reproduce exactly the same dependency tree on each run. Both package.json and yarn.lock must be checked into source control. As we run Yarn inside the container, we need to retrieve yarn.lock. Luckily, it’s not hard to extract yarn.lock after each run. Simply change the ADD line in the Dockerfile with the following:

ADD package.json yarn.lock /tmp/

and build the container using the following command:

docker build . -t yarn-demo; docker run --rm --entrypoint cat yarn-demo:latest /tmp/yarn.lock > yarn.lock

After the build, yarn.lock is copied to your working directory, and it will be reused on next Docker run, installing the same dependencies each time.

Congratulations! Now you have deterministic Yarn execution.

Wait! Now Yarn is executed at each container build

That is correct, we are now running Yarn at each build, even if package.json has not been modified. This is because yarn.lock is copied from the container to your working directory each time, even if it’s not changed, thus invalidating Docker layer caching. To solve this, we need to copy yarn.lockonly if it’s really changed. To do so:

Create a build.sh file:

#!/bin/bash

docker build . -t yarn-demo

docker run --rm --entrypoint cat yarn-demo:latest /tmp/yarn.lock > /tmp/yarn.lockif ! diff -q yarn.lock /tmp/yarn.lock > /dev/null 2>&1; thenecho "We have a new yarn.lock"cp /tmp/yarn.lock yarn.lockfi

Make it executable:

chmod +x build.sh

Use it to build the container:

./build.sh

Then run the container:

docker run yarn-demo node -e "console.log('Hello, World')"

Congratulations! You have now a deterministic Yarn execution, and Yarn is executed only when you change **package.json**.

What about Yarn package cache?

Another powerful feature of Yarn is package cache, which is stored on the local filesystem, to avoid downloading packages again. Our procedure so far does not maintain cache over container builds. This could be an issue for big package.json files.

The following build.sh solves the issue by saving Yarn cache on your working directory.

#!/bin/bash

# Init empty cache fileif [ ! -f .yarn-cache.tgz ]; thenecho "Init empty .yarn-cache.tgz"tar cvzf .yarn-cache.tgz --files-from /dev/nullfi

docker build . -t yarn-demo

docker run --rm --entrypoint cat yarn-demo:latest /tmp/yarn.lock > /tmp/yarn.lockif ! diff -q yarn.lock /tmp/yarn.lock > /dev/null 2>&1; thenecho "Saving Yarn cache"docker run --rm --entrypoint tar yarn-demo:latest czf - /root/.yarn-cache/ > .yarn-cache.tgzecho "Saving yarn.lock"cp /tmp/yarn.lock yarn.lockfi

You also need to add this to your Dockerfile , after the ADD package.json... line:

# Copy cache contents (if any) from local machineADD .yarn-cache.tgz /

The cache file is not meant to be pushed to the repo, so it should be added to a.gitignore file.

Congratulations, again! You have now a deterministic Yarn execution, which is executed only when you change **package.json**, and it uses Yarn caching. Try this with a complex package.json file from a real project, you will be amazed!

If you enjoyed this piece click the “♥︎” button below. For more pieces on DevOps and Docker, join my mailing list.