Beyond Container Orchestration - Kublr's Approach to Kubernetes Infrastructure Abstraction

[This post is inspired by an Interview with Kublr CTO, Oleg Chunikhin]

Let’s face it, organizations want to run applications in multiple environments - period. While most organizations may start using Kubernetes in the cloud, we are seeing a lot moving to multi-environments (hybrid or multi-cloud), and it’s Kubernetes that’s enabling this.

Kubernetes is inherently infrastructure and technology agnostic and can literally run on any machine (as long as it runs on Linux or more recently on Windows) on any infrastructure and is compatible with any cloud-native tooling. Organizations are adopting Kubernetes through different tools, each with their own Kubernetes 'flavor'.

Some of them, however, are diminishing some of Kubernetes' core abilities. To retain Kubernetes' openness and flexibility, Oleg Chunikhin, CTO at Kublr an enterprise-grade Kubernetes platform, explains that we need to shift our focus from seeing Kubernetes purely as a container orchestrator to leveraging it as an infrastructure abstraction layer.

What this means is that instead of simply orchestrating your applications through Kubernetes (which, depending on your 'flavor', can still be infrastructure dependent!), Kubernetes has the ability to abstract the infrastructure away from the application stack.

Only interacting with Kubernetes (whether on AWS, on-prem, or hybrid being irrelevant), developers can leverage any Kubernetes compatible stack. Chunikhin says, "not only does this make your stack a lot more portable, but it also adds flexibility along with building a much needed future-ready architecture.”

Accidental abstraction?

Kubernetes orchestrates containers, as we all know, and, in order to do its job properly, it decouples dependencies making each piece of your stack practically autonomous. While this infrastructure abstraction may (or may not) be an accidental benefit , it’s time to start looking at this quite powerful ability.

Chunikhin says “the main reason behind Kubernetes' unprecedented success, is infrastructure abstraction and the convenience with which Kubernetes provides it with.” Kubernetes accomplishes this, he further explains, by providing an abstraction layer consisting of standard representations of different infrastructure providers. It then lets you use this layer to run containers, custom business applications, middleware, and even system-level applications like Ceph, Glusterfs, Rook or MySQL for data storage.

Source:

Pixabay

These “standard representations” that Kubernetes provides by default can also be referred to as raw resource abstractions. They include nodes, overlay networks, services, ingress rules, and PVs/PVCs (Persistent Volumes and Persistent Volume Claims).

In addition to that, however, there are further levels of abstraction that Kubernetes affords us through operator frameworks and pluggable infrastructure architecture. While operator frameworks include Operator SDK along with CRD and API extensions, pluggable architecture capabilities include cloud providers, cluster autoscalers, CNI, CSI and cluster APIs.

This boils down to at least two ways in which Kubernetes lets you use capabilities from different providers, which in a lot of cases is two ways more than what your current setup allows for. Chunikhin calls operator frameworks “an additional customization level, over and above what Kubernetes gives you out of the box.”

Encapsulating experience

These abstractions allow the different departments of an organization to focus on what they do best. Operations teams can now focus directly on providing these abstractions as opposed to fine tuning infrastructure separately as per each department's requirements.

Similarly, development teams can now focus on writing software while still retaining a lot of control over how infrastructure is setup, thanks to the abstractions.

While both raw resource abstractions and pluggable architecture give you a very flexible framework that can be plugged into any infrastructure, operator frameworks allow you to encapsulate knowledge and experience that pertain to managing specific components of your application.

They do this by essentially automating a lot of operational tasks, which comes especially handy when it relates to the management of data storage systems. This also makes it feasible for smaller organizations to use hosted storage.

Chunikhin uses a simple yet complicated case of data storage as an example, with two open source tools called Ceph and Rook. Ceph is a distributed file system and Rook is an operator framework that makes it uniform and easy to run systems like Ceph and Cassandra in a Kubernetes cluster.

What Rook does is it allows applications to mount block devices and filesystems that it then manages “automatically.” Automatically here refers to handling any updates, failure recovery or scaling that these block devices and file systems would need, without any human intervention. It does this by automating configurations pertaining to storage components and subsequently monitoring the cluster to ensure both health and availability.

Other examples of operators include Kafka and HDFS - both of which have become even more powerful with the launch of Operator Framework by Red Hat.

Integrating infrastructure

A pluggable infrastructure abstraction basically works by efficiently exposing your provider’s capabilities to the applications and services running on your cluster. That means, you can still leverage “managed” services from your cloud provider but keep the ability of switching to popular open source or other tools if (or rather when) needed.

In fact, Chunikhin encourages everyone to use managed services where appropriate . What he advocates against is, tying you to an infrastructure and service stack that won't let you adapt to market demands.

However, a lot of people can’t afford such services. Afford not in the monetary sense, but rather in reference to regulations and legacy requirements. He uses data storage on a data science application as an example to explain how the ability to rely on multiple sources for data storage as opposed to a single cloud managed services, could be very valuable to organizations who have more stringent regulations to follow.

Source:

Pixabay

There is also another aspect of data science where users occasionally require expensive equipment, but can’t afford to reserve this equipment all the time, like GPU instances for example. In a situation where an organization wants to use regular equipment for regular day-to-day work and elastically expand to the cloud and use GPUs to run occasional experiments on larger data sets, Kubernetes’ pluggable infrastructure is a very convenient option.

There are two ways to integrate infrastructure. There is the PaaS-way where you integrate your application lifecycle and tools into your platform. And there is the Kubernetes-way where you keep everything “clean” and separate, basically a layered architecture vs buddeling several layers together.

Keeping the container orchestration layer separate from the middleware and managed services, ensures that your security and governance is at its most effective. "We built the Kublr Platform based on these very principles. As an architect by training, my focus has always been on cleanly separating the architectural layers. As soon as you start tying them to higher or lower layers, you compromise flexibility -- it's architecture 101 really."

In conclusion, Kubernetes' ability to abstract away the infrastructure represents a huge opportunity to future proof your new IT system. While Kubernetes inherently provides infrastructure abstractions, how it's configured will ultimately determine whether it will leverage or inhibit that ability. "Whichever Kubernetes route you may go", Chunikhin points out, "we recommend ensuring a layered architecture was implemented." Architecture will ultimately determine the longevity of your solution. The more flexible and pluggable, the more likely it'll be able to adapt to your evolving needs.