How to Properly Run Docker on RHEL and Friends

tl;dr use the lvm-direct storage driver. lvm-loopback is for prototyping and Does Not Scale

I was inspired to write this after reading thehftguy’s hilarious posts about how much he loves docker.

At $day_job we use docker on some of our jenkins slaves to build rpms and to support test execution for apps leveraging docker. Naturally this is a dev environment, so we’re not talking about running docker in production or anything Crazy. But we are talking about devs with pitchforks when jenkins jobs hang, and hang, and hang…

Everything was running smoothly for the most part. Every couple days a docker node would lock up, cpu and load avg spiked through the roof, reboot it and everything’d be back to normal. And we accepted that for a while.

It’s funny what you find when you go diving through logs.

wait, what? You mean it’s not as easy as yum install docker? But I thought you said the path to the land of milk and honey was paved with docker? Hmmm :\

So turns out storage with docker is kinda funny. Look for yourself, here’s the storage brochure…

https://docs.docker.com/engine/userguide/storagedriver/selectadriver/

It started with AUFS, some heroes at RedHat wrote device mapper plugins (thank you!), now there’s overlay, overlay2, btrfs, zfs, all-kinds-of-fsss

Here’s a fantastic article explaining the nitty-gritty on the history and development of the device mapper drivers.

The short version is, out-of-the-box docker on a rhel/centos system uses the lvm-loopback storage driver. It’s quick, it’s easy and it works at small scales.

If you wanna run real workloads on docker (on rhel/centos) you HAVE to use the lvm-direct driver. lvm-direct uses thin provisioning so it’s not your average lvm setup task.

Here’s a quick and dirty to setup the lvm-direct driver on a New node:

Add an additional, unpartitioned, un-filesystemed block device to your node OR when you provision the node, leave a percentage of the lvm volume group free (see useful links).
Since RedHat published docker 1.12, run the docker-storage-setup service. This Should detect your additional block device/free vg space and configure the thin-pool automagically.
That’s it. Start the docker daemon and validate. `docker info 2> /dev/null | grep loop` should return nothing. Run a container and see what happens

It’s been 27 days since switching to lvm-direct and no sign of the angry ‘why are my docker jobs hanging’ mob.

I suppose the real message here is you can’t just yum install $new_hotness and expect it to work. Perhaps if someone had RTFM we could’ve skipped this whole painful lesson. Naaaa…

Some useful links: