Introduction to the ISTIO and the Mechanics of mTLS

Hello everyone!

In this article, I will introduce you to the Istio and the potential challenges that arise during its implementation.

Let's say we have an application. It's evolving, with new tasks coming in from the business. It's a normal evolution process. However, as traffic and requests increase, there eventually arise tasks related to managing network traffic. For example:

Let's take an application with a backend and a frontend. From a cluster perspective, it appears as two groups of pods that communicate with each other. Sometimes, one of the backends starts experiencing issues. Not to the point of being terminal, but enough to cause glitches in the application. However, the kubehealth checks do not detect this and continue to include this pod in load balancing, resulting in continued traffic being sent to it. This only worsens the situation. It would be great to identify these boundary states where everything seems fine, but something is not right. We can give our pod some rest and let it return to load balancing with renewed strength.

Here's another example. Let's say you're in the cloud and you've spread your pods across different zones for fault tolerance reasons. However, the pods continue to communicate with each other in a disorderly manner. Such disorderly requests incur costs and result in additional latency.

Therefore, it would be appropriate for the frontends to choose backends that are located nearby in the same zone while keeping the remaining backends as backups. This routing method is called "Locality Load Balancing".

And there are numerous patterns for network management. That's precisely why Service Mesh was invented as a systematic solution for such challenges. Simply put, Service Mesh provides us with a set of building blocks that we can use to construct our network management patterns. In other words, Service Mesh is a framework for managing TCP traffic, which comes with its declarative language and the bonus of observability. Now, with Service Mesh in place, we no longer need to worry about the intricacies of network interactions at the application level. We can describe all those intricacies using Service Mesh. That's what Service Mesh is all about.

How it works?

But let's understand how it works. And let's imagine that we are fortunate enough to find ourselves in a parallel universe where Service Mesh hasn't been invented yet, and we are going to create it.

Let's get started. There's an application. It receives requests. It generates new requests. And it's this traffic that we want to manage. To control this traffic, we need to intercept it. That's what we're going to do now.

We infiltrate the network environment of the application. We deploy our interceptor there. We DNAT (Destination Network Address Translation) incoming traffic to it. And we do the same with outgoing traffic. However, we don't need to use two separate interceptors for incoming and outgoing traffic.

Our application lives in Kubernetes. This means that the application resides in a Pod. In other words, our application lives inside a container, which allows us to add our interceptor as a sidecar. It's very convenient. As a result, we have intercepted traffic at our disposal. Now, we need to do something with it, modify it, and the obvious solution is to use some kind of proxy. For example, nginx, haproxy, or envoy. Let's choose Envoy. Now we just need to make something useful with this intercepted traffic.

Let's suppose two patterns have been chosen. However, to manage the sidecar, an intermediate tool is missing. Therefore, we create a controller, and now it becomes possible to use a declarative language to manage the controller, which runs through the sidecars and configures them. In the world of Service Mesh, such a controller is called the Control Plane.

Now, let's talk about the pitfalls of Istio. Previously, when Istio wasn't in the picture, everything was straightforward. The user generates a request. The frontend generates a child request to the backend. Simple as that. However, as soon as Istio enters the scene, the whole scheme becomes more complex. The user's request is intercepted by the sidecar. The sidecar thinks. It decides to forward the request to the frontend. The frontend makes a child request. The sidecar intercepts it again. The sidecar thinks once more and sends the request to the backend. There, the request is intercepted by the sidecar again. In comparison to a clean installation, there are now many additional requests, and this cannot come without a cost. The most obvious cost is latency. But the Istio promises that you won't have more than 2.65 milliseconds of overhead.

mTLS

What about security? Can we trust the mutual TLS provided by Istio? And what exactly is mutual TLS?

Mutual TLS (mTLS) is needed when we want the client and server to communicate and have mutual trust in each other's identities. Additionally, mTLS is used when we want to encrypt the traffic between our applications. Technically, mTLS is achieved using regular SSL certificates.

In the world of Istio, each Pod has its certificate that verifies the authenticity of that Pod. Specifically, the certificate verifies the authenticity of the Pod's identity. The identity is called the principal and consists of three parts: the cluster ID (cluster.local), the namespace in which the application operates, and the Service Account under which the Pod runs.

The issuance of these certificates is managed by the Istio control panel, which utilizes its root certificate. This root certificate should not be confused with the root certificate of the cluster itself. Individual certificates are then issued based on this root certificate, ensuring secure communication. But how does this process work?

Let's dive into this pod and take a look at the entire lifecycle of a certificate. Unlike the service mesh we previously implemented with Envoy, Istio does not directly communicate with the control plane. Envoy communicates through an intermediary called the Istio agent. The Istio agent is a small program that resides within the container and is responsible for certificate rotation. How does it work? It generates a Certificate Signing Request (CSR) that includes a request for the identifier specific to our pod. This identifier is generated as follows: the Istio agent is aware of the Istio cluster, while the namespace and service account pose a slightly more complex scenario.

ServiceAccount

Kubernetes has API. It is assumed that the API will not be accessed anonymously. In particular, it will be accessed by pods. Regular applications typically don't need to interact with the API directly. Instead, various operators and controllers communicate with the API. This interaction cannot be anonymous, so special accounts were introduced to address this issue - they are called service accounts. When a service account is created, it doesn't carry any information at that moment. It simply exists and has a specific name.

And as soon as such a resource appears in the cluster, Kubernetes reacts to it and issues a JWT token that fully describes the corresponding service account. In other words, Kubernetes confirms that yes, in my cluster, in such a namespace with such name, there is this service account and provides a special token that is stored in a secret. Then it associates this secret with the service account. Thus, the service account essentially becomes a notarized resource that can verify its own identity using the token. This service account can be applied to a pod. When we do that, upon the pod's creation in the cluster, this secret will be mounted to the pod.

But even if you have never dealt with service accounts before, Kubernetes has already provided default service accounts. So, this file will be very useful for us to understand the namespace in which we are working and the service account on behalf of which we are operating. Essentially, we generate an identifier, place it with the CSR, and retrieve the token.

Now we have a CSR. We can send it to the control plane. But if we send it alone, how will Istio trust that it's us? That's why we also include this token. Istio checks this token through the Kubernetes API using the token review API, and if Kubernetes confirms that everything is fine, Istio signs the CSR and returns it to the sidecar. That's it.