How To Improve Your Docker Build Time in GitLab CI

Make your containerized CI environments truly useful by accelerating your Docker builds

Modern software development cycle means packaging your applications often as a container. This task can be time consuming and may slow down your testing or deployment significantly. The problem is especially obvious in the context of a continuous integration and deployment processe where images are built at every code modification.

In this article, we will discuss various ways of speeding up the build time of Docker images in a continuous integration pipeline by implementing different strategies.

Packaging a sample application locally

As an example, we will first take a Python Flask application. Cannot be simpler than that:

from flask import Flask

app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

Writing the Dockerfile

Let’s write the corresponding Dockerfile:

FROM python:3.7-alpine as builder

# install dependencies required to build python packages
RUN apk update && apk add --no-cache make gcc && pip install --upgrade pip


# setup venv and download or build dependencies
ENV VENV="/venv"
ENV PATH="${VENV}/bin:${PATH}"

COPY requirements.txt .
RUN python -m venv ${VENV} \
    && pip install --no-cache-dir -r requirements.txt

FROM python:3.7-alpine

# setup venv with dependencies from the builder stage
ENV VENV="/venv"
ENV PATH="${VENV}/bin:$PATH"
COPY --from=builder ${VENV} ${VENV}

# copy app files
WORKDIR /app
COPY app .

# run the app
EXPOSE 5000
ENV FLASK_APP="hello.py"
CMD [ "flask", "run", "--host=0.0.0.0" ]

You can see here a classic multi-stage build process:

We start with a light base image in which we install the build tools and download or compile the dependencies into a Python virtual environment
In the second stage, we copy the virtual env with our dependencies into the target image and finally add the application files

Why this two stages process? First, because you have a secured build process as it runs in a container without interference from the host environment. And second, you have a slim final image without all the build libraries but only what is required to run the app.

Running and testing the image

Making sure everything is working as expected:

docker build -t hello .
docker run -d --rm -p 5000:5000 hello
curl localhost:5000
Hello, World!

If you run the docker build command a second time:

docker build -t hello .
...
Step 2/15 : RUN apk update && apk add --no-cache make gcc && pip install --upgrade pip
 ---> Using cache
 ---> 24d044c28dce
...

As you can see, this second build is much quicker as layers are cached in your local Docker service and are reused if they present no change.

Pushing the image

Let’s publish our image to an external registry and see what happens:

docker tag hello my-registry/hello:1.0
docker push my-registry/hello:1.0

The push refers to repository [my-registry/hello]
8388d558f57d: Pushed 
77a59788172c: Pushed 
673c6888b7ef: Pushed 
fdb8581dab88: Pushed
6360407af3e7: Pushed
68aa0de28940: Pushed
f04cc38c0ac2: Pushed
ace0eda3e3be: Pushed
latest: digest: sha256:d815c1694083ffa8cc379f5a52ea69e435290c9d1ae629969e82d705b7f5ea95 size: 1994

Note how each intermediary layers are identified by a hash. We count 8 layers because we have exactly 8 dockers commands in our Dockerfile beyond our last FROM instruction.

It’s important to understand that layers from our base builder image are not sent to the remote Docker registry when we push our image, only layers from the last stage are pushed. The intermediate layers are still cached in the local Docker daemon though, they can reused for your next local build command.

No problem with local build, let’s now see how it works in a CI environment.

Building the Docker image in a CI pipeline context

In real life, the building and pushing of Docker images isn’t necessarily made locally like this but typically runs inside a continuous integration and deployment platform. You want to build and push your image at every code changes before deploying your application. Of course, the build time is critical as you want a very fast feedback loop.

Test CI environment

We will use a CI environment leveraging:

GitLab.com CI
Kubernetes Executor for hosting GitLab Runner

The last point is important because our CI jobs will run into a containerized environment. With that in mind, each job is spawned as a Kubernetes Pod. Every modern CI solution use containerized job and all face the same problem when trying to build Docker containers: you need to make the Docker commands works inside a Docker container.

To make everything go smoothly you have two options:

Binding the/var/run/docker.sock on which the Docker daemon listens, effectively making the host daemon available to our job container
Using an additional container running “Docker in Docker” (aka dind) alongside your job. Dind is a special Docker variant running as privileged and configured to be able to run inside Docker itself 😵

We will use the later option for simplicity.

GitLab pipeline implementation

In a GitLab pipeline, you usually create utility containers like DinD by means of the service keyword.

In the pipeline excerpt below, both the docker-build job and the dind service container will run in the same Kubernetes Pod. When docker is used in the job’s script, it will sends commands to the dind auxiliary container thanks to the DOCKER_HOST environment variable.

stages:
  - build
  - test
  - deploy

variables:
  # disable Docker TLS validation
  DOCKER_TLS_CERTDIR: ""
  # localhost address is shared by both the job container and the dind container (as they share the same Pod)
  # So this configuration make the dind service as our Docker daemon when running Docker commands
  DOCKER_HOST: "tcp://localhost:2375"

services:
  - docker:stable-dind

docker-build:
  image: docker:stable
  stage: build
  script:
      - docker build -t hello .
      - docker tag my-registry/hello:${CI_COMMIT_SHORT_SHA}
      - docker push my-registry/hello:${CI_COMMIT_SHORT_SHA}

Running the pipeline

This pipeline should run fine. By running it once and checking the job output we have:

docker build -t hello .

Step 1/15 : FROM python:3.7-alpine as builder
...
Step 2/15 : RUN apk update && apk add --no-cache make gcc && pip install --upgrade pip
---> Running in ca50f59a21f8
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKINDEX.tar.gz
...

As it’s the first time we are building our container, every layer is built by executing commands. The total run time of the job is around 1 minute.

If you run your pipeline a second time without changing anything you should observe the same thing: every layer is rebuilt! When we ran our build commands locally, cached layers where reused but not here. For such a simple image it doesn’t really matter but in real life where some images may takes tens of minute to build it can be a real hassle.

Why is that? Simply because in this case dind is a temporary container that is created with the job and die after the job is done so any cached data is lost. Sadly, you cannot easily persist the data between two pipeline launches.

How we can benefit from the cache and still be running a dind container?

Benefiting from the Docker cache while running Docker in Docker

One solution: Pull/Push dancing

The first solution is rather straightforward: we will use our remote registry (the one we push into) as a remote cache for our layers.

More precisely:

We start by pulling the most recent image (i.e. latest ) from the remote registry to be used as a cache for the subsequent docker build command.
Then we build the image using the pulled image as a cache (--cache-from argument) if available.We tag this new build withlatest and with the commit SHA.
Finally we push both tagged images to the remote registry so that they may also be used as cache for subsequent builds.

stages:
  - build
  - test
  - deploy
    
variables:
  # disable Docker TLS validation
  DOCKER_TLS_CERTDIR: ""
  DOCKER_HOST: "tcp://localhost:2375"

services:
  - docker:stable-dind

docker-build:
  image: docker:stable
  stage: build
  script:
    - docker pull my-registry/hello:latest || true
    - docker build --cache-from my-registry/hello:latest -t hello:latest .

    - docker tag hello:latest my-registry/hello:${CI_COMMIT_SHORT_SHA}
    - docker tag hello:latest my-registry/hello:latest

    - docker push my-registry/hello:${CI_COMMIT_SHORT_SHA}
    - docker push my-registry/hello:latest

If you run this new pipeline two times, the cache use is still disappointing.

The layers from the base builder image are all rebuilt.Only the first 2 layers (8 & 9) of the final stage are using the cache but the following layers are rebuilt.

Like we saw earlier, when pushing our image locally, the layers of the base builder image are not pushed to the remote registry and are effectively lost. Consequently when we are pulling the latest image, they are not there and need to be rebuilt.

Then when our final stage image is built (step 8 to 15), the first two layers are present in the image we pulled and used as cache. But in step 10 we are getting dependencies from the builder image which have changed so every steps after are also built again.

To sum it up, there is only a modest cache use with 2 steps out of 15 benefiting from the cache! To improve it, we need to push the intermediary builder image to the remote registry to persist its layers:

stages:
  - build
  - test
  - deploy
    
variables:
  # disable Docker TLS validation
  DOCKER_TLS_CERTDIR: ""
  DOCKER_HOST: "tcp://localhost:2375"

services:
  - docker:stable-dind

docker-build:
  image: docker:stable
  stage: build
  script:
    - docker pull my-registry/hello-builder:latest || true
    - docker pull my-registry/hello:latest || true

    - docker build --cache-from my-registry/hello-builder:latest --target builder -t hello-builder:latest .
    - docker build --cache-from my-registry/hello:latest --cache-from my-registry/hello-builder:latest -t hello:latest .

    - docker tag hello-builder:latest my-registry/hello-builder:latest    
    - docker tag hello:latest my-registry/hello:${CI_COMMIT_SHORT_SHA}
    - docker tag hello:latest my-registry/hello:latest

    - docker push my-registry/hello-builder:latest
    - docker push my-registry/hello:${CI_COMMIT_SHORT_SHA}
    - docker push my-registry/hello:latest

We build our builder intermediary stage as a proper docker image using thetarget option. After that, we push it to the remote registry, eventually pulling it as a cache for building our final image. When running the pipeline, our time is down to 15 seconds!

You can see the build is slowly becoming quite complicated. If you are lost, just think about an image with 3 or 4 intermediary stages! It does work though. Another drawback is that you have to upload and download all these layers each time which may be quite expensive in storage and transfer costs.

Another solution: external dind service

We need to have a dind service running to execute our docker build. In our previous try, dind is embedded into each job and share the lifecycle of the job making it impossible to build a proper cache.

Why not make dind a first class citizen by creating a dind service in our Kubernetes cluster? It would run with a PersistentVolume attached to handle the cached data and every jobs could send their docker commands to this shared service.

Creating such a service in Kubernetes is easy:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    app: docker-dind
  name: dind
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Gi

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: docker-dind
  name: dind
spec:
  replicas: 1
  selector:
    matchLabels:
      app: docker-dind
  template:
    metadata:
      labels:
        app: docker-dind
    spec:
      containers:
        - image: docker:19.03-dind
          name: docker-dind
          env:
            - name: DOCKER_HOST
              value: tcp://0.0.0.0:2375
            - name: DOCKER_TLS_CERTDIR
              value: ""
          volumeMounts:
            - name: dind-data
              mountPath: /var/lib/docker/
          ports:
            - name: daemon-port
              containerPort: 2375
              protocol: TCP
          securityContext:
            privileged: true #Required for dind container to work.
      volumes:
        - name: dind-data
          persistentVolumeClaim:
            claimName: dind
            
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: docker-dind
  name: dind
spec:
  ports:
    - port: 2375
      protocol: TCP
      targetPort: 2375
  selector:
    app: docker-dind

Then we slightly modify our original GitLab pipeline to point to this new external service and remove the built-in dind service:

stages:
  - build
  - test
  - deploy
    
variables:
  # disable Docker TLS validation
  DOCKER_TLS_CERTDIR: ""
  # here the dind hostname is resolved as the Kubernetes dind service by the kube dns
  DOCKER_HOST: "tcp://dind:2375"

docker-build:
  image: docker:stable
  stage: build
  script:
    - docker build -t hello .
    - docker tag hello:latest my-registry/hello:{CI_COMMIT_SHORT_SHA}
    - docker push my-registry/hello:{CI_COMMIT_SHORT_SHA}

If you run the pipeline twice, the second time the build should be 10 seconds, even better than our previous solution. For a “big” image taking around 10 minutes to build, this strategy also reduce the build time to a few seconds if no layers have changed.

One last option: using Kaniko

A final option may be to use Kaniko. With it, you can build Docker images without the need of a Docker daemon, making everything we saw a non-problem.

However, please note that doing so you cannot use advanced BuildKit options like for example injecting secrets when building your image. For this reason, it’s not the solution I retained.

Conclusion

As software development makes heavy use of containers everywhere, building them efficiently is key in your release pipeline. Like we’ve seen, the problem can become quite complex and every solutions has its trade-off. The solutions proposed here are illustrated with the use of GitLab but keep in mind they are still true in any other containerized CI environment.

Read behind a paywall at https://medium.com/swlh/dramatically-improve-your-docker-build-time-in-gitlab-ci-db0259f1bb08