To commit or not to commit?

There are two* ways to create a Docker image:

Write a Dockerfile and run docker build on it
Run a container, make changes and run docker commit to create a new image.

The commit approach feels like a nice way of saving your work as you go along. Realise you need another package? Run apt-get install inside the container and then docker commit outside, and you’ve got a new image ready to go with that package ready-installed.

Don’t do it.

No, really, I know it’s tempting, but if you’re every going to use this container image again, don’t use commit. There are significant downsides to the commit approach:

You can’t reproduce the image
You can’t change the base image

Creating a Dockerfile might feel a bit like work, but if you’re ever going to want this image again, it’s the approach you need to take.

Dockerfiles for reproducible(-ish) images

With the Dockerfile approach you have a written-down list of what went into the image. Re-run the docker build and you’ll get the same image again.

Well, near enough. It may well not be bit-for-bit identical. Both the base image and any code you’re installing from elsewhere could have changed since the last time you ran the build. It’s up to you to define your dependencies with whatever level of specificity you need — for example, you could specify the base image by SHA to be completely sure of the version you’re getting.

If you use the Dockerfile approach, you’ll add a RUN directive for each package you install — or, better, concatenate them together into a single RUN, reducing the number of layers in the image.

Tip: If you’re using apt, you can easily see what has been installed by looking /var/log/apt/history.log, which is full of entries like this:

Start-Date: 2017-03-17  17:00:58
Commandline: apt-get install libcgroup-dev
Install: libcgroup1:amd64 (0.41-6, automatic), libcgroup-dev:amd64 (0.41-6)
End-Date: 2017-03-17  17:01:00

This is easier to use for constructing your Dockerfile than apt list --installed which doesn’t differentiate between installed packages and their dependencies.

That’s only going to help you with packages you installed with apt, of course. Don’t forget to add the installation of any language- or app-specific requirements you might have.

Changing the base image

Sometimes it’s a very good idea to upgrade the base image you’re using. An obvious case is when a vulnerability has been found and patched in a newer version of the base image. You’d like to take that patch in your container, right?

If you’ve got an image that you’ve saved with a docker commit, well, good luck with that.

<thinking out loud>If an image is made up of layers, and the layers are basically tarballs, can’t you manually start with the new base image layers, and then apply the other (committed) layers on top? Or does the layer “diff” only work if the layer beneath is bit-identical to what it was created from? If anyone has tried this let me know, otherwise I feel an experiment coming on…</thinking out loud>

Dockerfile as documentation

As a side effect, the Dockerfile describes what’s inside your image in a fairly human-readable way. Reading a Dockerfile can be a lot quicker than pulling and running the image to see what’s installed.

Of course this suggests you have kept the Dockerfile somewhere, and you can easily find the Dockerfile that corresponds to the image and vice versa.

At the moment the tooling that ties images and their source code together is all still pretty Heath-Robinson in my opinion. No-one’s going to make you check your Dockerfile in anywhere, and even if you do, can you always remember which project you put it in? Was it in GitHub or Bitbucket? Maybe it was just in a gist? Do all your source code projects match the names of the images in your image registry?

A couple of options for things you can do here:

Copy the Dockerfile into the image as you build it.
Use conventional labels so that it’s easy to see where the source code came from.

When not OK is OK, really

There are situations where it’s tolerable to use commit: you’re running an experiment, that you think might break something quite badly — maybe you’re running a script with some scaryrm -rf's in, or you’re mounting a pseudo filesystem where you’re not 100% sure what effect it’s going to have. Commit the image before you run the experiment by all means, and then you can easily come back to where you were before you ran the experiment.

Another situation where you could conceivably not ruin yourself with commit is for a temporary container image that you’re using to run something as a one-off. There’s an example of this in Jérôme Petazzoni’s excellent tutorial on compiling Go code using Docker. If your goal, as here, is to run a binary, and the container image is merely a temporary tool to help you do that, then a commit approach makes sense.

But, if you’re going to want the image in the future, take the time to create something reproducible and write a Dockerfile.

*You could also create an image from a tarball with docker import but I doubt many people are doing that. Or are you?

Help me help people with their commitment issues by hitting the recommend button 💚_. Thanks!_

Picture credit: m.a.r.c