Starting Small With Kubernetes and Kubeadm: 500 Containers?

At Vurt we are performing some internal experiments with Kubernetes. In this test and post, we are specifically discussing kubeadm as the installer and also running k8s nodes on baremetal servers. Oh and ZFS too.

We believe Kubeadm is an important project. Vurt’s team has considerable experience operating OpenStack clouds, and we see some stark differences in how the k8s community has approached operational decisions versus what the OpenStack community has historically done.

This is not to say that one or the other is correct, rather that there are different choices that can be made.

OpenStack: No Default Deployment Option

OpenStack made a clear choice not to back one particular deployment mechanism. So there are many, and new ones all the time. For example OpenStack Helm is a recently created project that will deploy an OpenStack cloud’s control plane using Helm and Kubernetes. There are probably five or six major installation and management systems for OpenStack.

To make a long story short, this is option #1: No default deployment system. Instead we get many “installers”, and, in turn, a somewhat fractured view of how to deploy OpenStack. (Let’s not even get into Day 2 operations.) But this post isn’t about OpenStack, it’s about k8s and kubeadm.

Kubernetes: Kubeadm

Kubernetes, on the other hand, has decided to implement a default deployment mechanism: kubeadm. Certainly there are many other “distros” and “installers” but kubeadm is a real thing.

Over at SIG-cluster-lifecycle, we’ve been hard at work the last few months on kubeadm, a tool that makes Kubernetes dramatically easier to install. We’ve heard from users that installing Kubernetes is harder than it should be, and we want folks to be focused on writing great distributed apps not wrangling with infrastructure! — k8s blog

(An important part of that blog post is the part where they mention that kubeadm will not provision servers, ie. it will deploy k8s to some existing Linux servers. So the team has decided to limit the scope ever so slightly.)

Basically, the k8s community has decided that there will be a default installer that comes as part of the k8s core code. This is an interesting, and potentially powerful, decision.

Trying to Deploy 500 Containers

For one of our tests, we decided to perform a quick deployment of k8s using kubeadm on top of four servers: One virtual machine, and three baremetal nodes. The VM and networking is managed by OpenStack, currently the physical nodes are deployed automatically outside of OpenStack. The physical servers have to 1GB NICs configured in a bond.

As far as computing resources, the baremetal nodes have 128GB of memory and 32 CPUs. The VM will act as the k8s master and it only has 4GB of memory and 2 CPUS. Our goal was to deploy 500 containers.

It’s important to note that the this test wasn’t really about just getting 500 containers up, it was more along the lines of exploring what kubeadm does, and what we can do with its default settings, as opposed to “pushing the limits” of the hardware and software. We did not expect to be able to run 500 containers on three nodes without making some configuration changes, but we wanted to try!

The baremetal nodes also have a single SSD (for cache) and a SATA drive configured with ZFS. Further, Docker is configured on the nodes and has ZFS as the image backing engine. We would like to use ZFS if possible as our default storage system, and thus are evaluating the use of ZFS on Linux (ZoL) for use with Docker/Kubernetes.

ubuntu@bm-node-1:/var/log$ sudo zpool listNAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOTzpool-docker 840G 380M 840G - 0% 0% 1.00x ONLINE -

Due to the use of ZFS, we had to run kube init with an option to skip the pre-flight checks. Here’s the warning notice.

[preflight] Some fatal errors occurred:unsupported graph driver: zfs[preflight] If you know what you are doing, you can skip pre-flight checks with `--skip-preflight-checks`

Also, in this test we are using the (somewhat) default Weave networking plugin. The installation of Weave is fascinatingly easy!

ubuntu@kube-master:~$ kubectl apply -f https://git.io/weave-kube-1.6serviceaccount "weave-net" createdclusterrole "weave-net" createdclusterrolebinding "weave-net" createddaemonset "weave-net" created

The kube-master node and networking were configured first, and then the baremetal nodes initilizaed after.

Creating the Deployment

Once k8s was up and running (thanks kubeadm!) we used an example nginx example deployment and started with a couple pods. Then we scaled up, first 50 at a time, then 100…finally reaching 500.

ubuntu@kube-master:~$ cat deployment.yamlapiVersion: apps/v1beta1kind: Deploymentmetadata:name: nginx-deploymentspec:replicas: 500 # tells deployment to run 2 pods matching the templatetemplate: # create pods using pod definition in this templatemetadata:# unlike pod-nginx.yaml, the name is not included in the meta data as a unique name is# generated from the deployment namelabels:app: nginxspec:containers:- name: nginximage: nginx:1.7.9ports:- containerPort: 80

These scaling actions went perfectly well until we tried to go from 250 to 500. The wall we ran into was that by default the number of pods per node is limited to 110.

ubuntu@kube-master:/var/log$ kubectl get nodes -o json | jq '.items[] | {name: .metadata.name, cap: .status.capacity}'{"name": "kube-master","cap": {"cpu": "2","memory": "4046412Ki","pods": "110"}}{"name": "bm-node-1","cap": {"cpu": "32","memory": "131999476Ki","pods": "110"}}{"name": "bm-node-2","cap": {"cpu": "32","memory": "131999472Ki","pods": "110"}}{"name": "bm-node-3","cap": {"cpu": "32","memory": "131999636Ki","pods": "110"}}

Looking at the kublet docs there is an option to set the pod limit.

--max-pods int32                                          Number of Pods that can run on this Kubelet. (default 110)

Overall we were very happy with this particular test. In an hour or so we deployed a k8s cluster onto four nodes, deployed a software defined networking system (Weave), and created 250 containers with almost zero effort. In future tests we will alter the pod per node limit and see what happens! Certainly the baremetal nodes can run more, especially if those containers aren’t really doing anything.

Beta Customers

We are currently looking for a small number of beta customers to join us on our journey into advanced Kubernetes deployments on hosted, private infrastructure. Please email vurt@vurtcloud.com or use our contact form if that sounds like something your organization would like to be involved with.