What Is GitOps And Why Is It (Almost) Useless? Part 1

On a new airline. A stewardess enters the passenger cabin: "You are on our new airline. In the nose of the aircraft, we have a cinema hall. In the tail part - a hall of slot machines. On the lower deck - a swimming pool. On the upper deck - a sauna. Now, gentlemen, fasten your seatbelts, and with all these unnecessary things, we're gonna try to take off.

Hi, my name is Andrii. I've been working in the IT industry for most of my life. I'm very interested in the evolution of infrastructure configuration management engineering. For the last 8 years, I've been involved in DevOps.

One of the fresh popular trends is the concept of GitOps, introduced in 2017 by Alexis Richardson, the CEO of Weaveworks. Weaveworks is a large adult company that, in 2020, raised over 36 million in investment to develop its GitOps.

My previous article discussed a cost-cutting success story on how we switched from Elastic Stack to Grafana. Now, I'm going to try to talk about the non-obvious challenges that may await you when adopting this concept. In short, GitOps is not a "Silver Bullet". You'll likely end up reorganizing with many complicated workarounds. I've been down this road myself and want to show you the most frustrating problems you can't see when reading other articles about GitOps.

Content Overview

What GitOps is and why you (don’t) need it
Snowflake Servers Issue
GitOps - A Panacea for all Your Problems (or not)
The Logic of Using Flux with Helm
Custom Flux Resources
Checklist for GitOps
Violation of the Single Source of Truth Concept
Small Conclusion

What GitOps is and why you (don`t) need it

Let’s dive right in!

Stateless and Stateful

The most promising concept of infrastructure construction today is immutable infrastructure. Its key idea is to divide infrastructure into 2 fundamentally different parts: Stateless and Stateful.

The Stateless part of the infrastructure is immutable and idempotent. It does not accumulate state (does not store data) or change its operation depending on the accumulated state. Instances of the Stateless part may contain some basic artifacts, scripts, and assemblies. As a rule, I create them from base images in cloud/virtualized environments. They are fragile and ephemeral: I deliver new versions of applications by recreating instances from new base images.

Persistent data is stored in the Stateful part. It can be realized by the classical scheme with dedicated servers or by some cloud mechanisms (DBaaS, object, or block storage).

To make this “zoo” manageable and work correctly, we need collaboration between engineering and DevOps teams, as well as fully automated delivery pipelines.

CI part

Extreme programming is one of the agile development methodologies. It is distinguished by many feedback loops, which allow you to maintain synchronization with the client's needs.

We implement the automation of delivery pipelines using CI/CD systems. The term CI (Continuous Integration) was offered by Grady Booch in 1994, and in 1997 Kent Beck and Ron Jeffries introduced it into the discipline of extreme programming. In CI, we need to integrate our changes as often as possible into the main working branch of our project.

This requires, first, a more granular decomposition of tasks: small changes are more atomic and easier to track, understand, and integrate. Second, we can't just merge freshly written code. Before merging branches, we must ensure that nothing that worked before has been broken. To do this, the application should at least be built. It is also a good idea to cover the code with tests.

And this is the task performed by CI systems, which have gone a long way in the development and, somewhere in the middle of this path, turned into CI/CD systems.

CD part

What is a CD? Martin Fowler distinguishes 2 CD definitions:

Continuous Delivery. This is when, with the help of Continuous Integration practices and DevOps culture, you keep the main branch of your project constantly ready to be deployed to production.
Continuous Deployment. It is Continuous Delivery where everything that goes into the main branch gets dumped into your cluster, into your production.

Let’s go further.

Snowflake servers issue

Unfortunately, immutable infrastructure has several problems. The lion's share of them is inherited from the concept of Infrastructure as Code (IaC).

First of all, it is configuration drift. This term was born in the Puppet Labs (authors of the well-known Puppet SCM) and states that not all changes on target systems are made with the help of system configuration management (SCM). Some are done manually, bypassing them.

In the process of such multiple changes, configuration drift appears - the difference between the configuration described in SCM and the real state of affairs.

This leads to an automation fear spiral.

The more manual changes made, the more likely that running an SCM script will break unrecorded changes. The scarier it is to run it, the more likely new manual edits will be made.

Eventually, this vicious positive feedback leads to the formation of snowflake servers, which have become so inconsistent that no one understands what's inside anymore. After manual edits, the node becomes as unique as each individual snowflake in a snowfall.

This drift leaves the servers at higher levels within immutable infrastructure: now we can talk about GCP Project/AWS VPC/Kubernetes-cluster-snowflakes. This happens because the implementation of changes is not regulated on immutable infrastructure. Moreover, nobody knows how to do it properly.

GitOps - a panacea for all your problems (or not)

And then Weaveworks comes along and says, "Guys, we have what you need - GitOps". To promote GitOps, they brought in a heavyweight like Kelsey Hightower, who created the "Kubernetes the hard way" guide. During his PR, he heavily broadcasts the message, "Be a man, b...! Stop Scripting and Start Shipping." And he says a certain amount of marketing bullshit bingo.

In my opinion, the most exciting benefits were:

Improved consistency and standardization of deploys
Improved security assurance
Easier and faster recovery from errors
Easier management of accesses and secrets
Self-documenting deploys
Knowledge distribution within the team

And anyone trying to figure out what GitOps is comes across this textbook slide.

Next, we find the GitOps principles, which resemble slightly augmented IaC principles:

GitOps is declarative
GitOps apps are versioned and immutable
GitOps apps are pulled automatically
GitOps apps are continuously reconciled

Nevertheless, this is a spherical description in a vacuum, so we continue our research. We find the GitOps.tech website and on it several important clarifications.

First, we learn that GitOps is an infrastructure-like code in Git with CD tooling that automatically applies this to the infrastructure.

We must have at least 2 repositories within GitOps:

Application repository. It describes the application source code and manifests that describe the deployment of that application.
Infrastructure repository. It describes the infrastructure manifests and the deployment environment.

Also, in the GitOps ideology, a pull-oriented approach is preferred over a push-oriented approach. This is somewhat contrary to the evolution of SCM systems from the heavyweight pull monsters Puppet and Chef to the lightweight push-based Ansible and Terraform.

And if GitOps is primarily a toolkit story, then it makes sense to take the Flux-based concept from Weaveworks itself and deconstruct it. The authors of the idea must have made a reference implementation.

Flux is now up to version 2 and architecturally consists of controllers that work within a cluster:

Source controller
Kustomize controller
HELM controller
Notification controller
Image automation controllers

Next, let`s discuss the work with Flux and Helm.

The logic of using Flux with Helm

I'm going to further describe the example of deploying an application using Helm package manager in Flux 2.

Why? According to CNCF Survey 2021, HELM package manager was the most popular Packaging application, with a share of more than 50%.

Unfortunately, I couldn't find more up-to-date data, but I don't think anything much has changed since then.

So, let's walk through the basic logic of how Flux 2 works with Helm. We have 2 repositories: application and infrastructure.

We make a HELM chart and docker image from the application repository and add them into the Helm chart repository and docker registry, respectively.

Next, we have a Kubernetes cluster running the flux controllers.

To roll out our application, we prepare a YAML describing the custom resource (CR) HelmRelease and add it to the infrastructure repository.

To help flux get it, we create a CR GitRepository in the Kubernetes cluster. The source controller sees it, goes to git, and downloads it.

To deploy this YAML into a cluster, we describe a Customization resource.

The Kustomize controller sees it, goes to the Source controller, gets the YAML, and deploys it to the cluster.

The Helm controller sees that a CR HelmRelease has appeared in the cluster and goes to the Source controller to get the HELM chart described.

For the Source controller to give the HELM controller the requested chart, we must create a HelmRepository in the CR cluster.

Helm-controller gets a chart from Source-controller, creates a release, and deploys it to the cluster. Then Kubernetes creates the necessary pods, goes to the docker registry, and downloads the corresponding images.

Accordingly, to roll out a new version of our application, we have to make a new image, a new HelmRelease file, and possibly a new HELM chart. Then we must put them into the appropriate repositories and wait for the Flux controllers to repeat the work in the chain described above.

And, to end our work, we put a Notification controller somewhere that notifies us of what might have gone wrong.

Custom Flux resources

Now let's discuss the custom resources that Flux operates with.

The first one is the Git repository. Here we can specify the address of the Git repository (line 14) and the branch it looks at (line 10).

Thus, we only download a single branch, not the whole repository. But! Since we are responsible engineers and try to adhere to the Zero Trust concept, we lock access to the repository, create a secret with a key in the Kubernetes cluster and give it to Flux so that it can go there (line 12).

Next is Kustomization. Here I want to draw your attention to the fact that the Kustomize controller from Flux and Kustomize from the authors of Kubernetes are 2 different things. I don't know why such disorienting naming was chosen, but it is important not to confuse them.

Kustomization is a way to deploy YAML (any) from a Git repository to a cluster. Here we have to specify the source where we put it from (line 12 - the name of the CR GitRepository described above), the directory where we take the YAMLs from (line 8), and we can specify the target namespace where to deposit them (line 13).

Next is the Helm release.

Here we can specify the name and chart version (lines 10,11). Here you specify variable values so Helm can customize the release from environment to environment (lines 15-19). This is an extremely important and necessary feature, as your environments may differ significantly. You also specify the source to take the Helm chart (lines 12, 13, 14). In this case, it is the Helm repository.

But! Since we are still responsible engineers, we also have close access to the Helm repository and give Flux a secret to get there (lines 7, 8).

Checklist for GitOps

So, let's do a little checklist to capture what we just went over. To start doing GitOps, we have to suddenly write a bunch of scripts (we do remember that Immutable infrastructure is all about fully automated delivery pipelines). So first of all, we have to create:

Script to build and push images to the Docker registry
Infrastructure Git repository
Account for CI system access to the infrastructure GIT repository
Script to generate and push the HelmRelease file
Helm Repository
Account for CI system access to the Helm repository
Script to build and publish Helm chart`
Flux account for infrastructure repository
Flux account for the Helm chart repository

Great, now you have a checklist for GitOps. Move on.

Violation of the Single Source of Truth Concept

Let's see what we get with our Helm release in general. It is quite evident that Git cannot be the only source of truth in this particular case. We have at least 2 resources, 2 artifacts outside of git, on which this Helm release depends:

Helm chart (lines 8-14)
Docker image (line 19)

And we can complicate things even more and specify the range of Helm chart versions.

In this case, Flux will monitor and set new Helm charts that appear within this range. In addition, the Source controller we have can use YAML as a source, including S3 bundles.

From there, we can keep both YAML and Helm charts.

In addition, we have Image automation controllers that can keep an eye on new images in the Docker registry and edit the infrastructure repository.

But we don't want HELM Chart repo-Ops or Docker registry-Ops. We want to be as GitOps as possible. So we look at the documentation and correct the processes to deploy our Helm chart from the GIT repository (we choose the application repository to store it).

This forces us to make another CR GitRepository for the application repository, an account for Flux to access it, and create a secret with keys.

At the same time, we do not solve the problem of a complicated dependency on Docker image in any way.

Small Conclusion

I think that's enough for today. In the 2 part, I will tell you what problems this goodness has. I will discuss:

Multiple environments problem
Values from
Secrets problem
CI Ops vs GitOps
Security
Rollback procedure
Multiple cluster problem
Who really needs GitOps?

I hope this article was useful for you!