Inspecting Docker images without pulling them

Written by beld_pro | Published 2017/11/26
Tech Story Tags: docker | containers | devops | programming | coding

TLDRvia the TL;DR App

Hey,

depending on what you’re trying to build it might happen that part of it involves inspecting a Docker image from a registry but you can’t afford to pull it.

It turns out that there’s an API that allows you to perform exactly that — be it DockerHub or a private registry.

The Docker Registry HTTP API is the protocol to facilitate distribution of images to the docker engine. It interacts with instances of the docker registry, which is a service to manage information about docker images and enable their distribution.

Testing it locally

The first step to test it locally is raising a registry from the library/registry image.

Check that it’s definitely working:

Having the local registry working, we can move to the script that inspects images right from the registry metadata. The following script contains all that it takes to retrieve it and relies only on two dependencies: bash and jq.

ps.: It’s important to note that the API calls need to specify the type of content that it accepts (_application/vnd.docker.distribution.manifest.v2+json_).

Let’s now check if it’s working for real:

Cool, it indeed works!

While that does the job for registries that require no authentication, it’s not suitable for DockerHub. When checking a public image there, we need to perform an extra step before getting the digest of an image — retrieve a token. With such token in hand, we can then inspect either public or private images.

Going back to scripting, let’s create a different one to deal with this case — call it get-public-image-config.sh (this is for brevity sake, using some other programming language you could place some conditionals and detect each case).

The additional code can be placed in a method called get_token which only takes image as an argument:

With the token in hands it’s just a matter of making use of it on the other calls.

If we were targetting private images we’d modify get_token a little bit: on the call to get the token from auth.docker.io we’d need to do it with a DockerHub username and password pair (without authentication we can only have access to public images). To do so, specify in that call an authorization header (--user flag in curl):

Now with this new token we can retrieve the digest of a given image and tag (pay attention to the extra Authorization: Bearer $token header that we added):

With that we can have a full script that retrieves public images from DockerHub (check how in main we first retrieve a token, then we pass that token to the following methods):

Note.: again, you must add the full name of the image (official images use the _library_ repository so _nginx_ should be referred as _library/nginx_).

To make sure that it works, run it against an image like nginx:

If you try to retrieve an image that is not very new (say, that has some 2 years) you’ll notice that the script I posted above might not work.

The reason for that is that images that have been pushed to the docker registry a long time ago won’t use the second version of the V2 manifest. However, they still present the image configuration even though in a regular string.

Bellow is a script that deals with that case:

If you’re looking for the difference, look at main. Essentially we abandon the idea of retrieving a digest and simply pick the “old config”. From the old config, we look at the first blob in the list which represents the uppermost layer - the layer that contains all the info altogether. From there we parse that plain-text JSON and then get the config.

Closing thoughts

Interacting with DockerHub or a private registry is not all that hard, it’s just not very documented. Having these scripts it becomes pretty easy to get it working on any language you want — just add some checks, parse the image names and you should be good to go.

Here are the resources mentioned in the article:

I can't finish the article without mentioning some alternatives that people suggested after I first published the article in ops.tips:

  • https://github.com/GoogleCloudPlatform/container-diff — "Diff your Docker containers" — it looks pretty interesting but even though you can specify remote:// to an image when analyzing it, it looks like it always pulls the entire image. Maybe I got something wrong?
  • https://github.com/projectatomic/skopeo — ”Work with remote images registries — retrieving information, images, signing content” — well, does exactly what it says! I totally recommend for the purpose of inspecting images or repositories without pulling them 👍

By the way, if you wan’t to try Skopeo, building it from source is very easy on MacOS:

Now inspecting an image or a repository from Dockerhub is one command away:

Note that here I’m specifying the --override-os flag to the command. The reason is that otherwise it’ll try to inspect the image or repository filtering by digests marked with OS=darwin. If you’re using Linux you’d not need to use that flag.

If you’re not willing to implement in your favorite language and just want to gather the configuration of an image, make sure you check Skopeo.

Please let me know if I got anything wrong and/or if there’s an easier way to do it.

Have a good one!

finis

Originally published at ops.tips on November 26, 2017.


Published by HackerNoon on 2017/11/26