Monitoring containerized microservices with a centralized logging architecture.

A case study of Project Horus

Having small specialized services working together to achieve a business goal is better than a giant monolithic service that does everything. That is the core premise of the microservices architecture.

However, monitoring microservices quickly becomes a challenge at scale…in comparison to a monolith where you only have to look in one place.

This article presents an architecture for managing logs that achieves the simplicity of a monolith, without sacrificing the robustness of microservices.

You can find an actual implementation of the concepts discussed here on my Project Horus repository. Oh, and there’s a hidden message in the illustration above, see if you can find it, otherwise keep reading till the end.

Why centralize logs?

Suppose you have an e-commerce system powered by a bunch of microservices like Authorization, Product catalog, and Billing. Now imagine a customer’s checkout process fails, how would you go about determining the cause? You could check the logs of each microservice and eventually find the one with issues. However, this does not scale past a few services.

The more practical approach is to gather the logs from each microservice in a central searchable database. That way when something breaks, you have the complete story easily, hence decreasing your Mean Time To Repair (MTTR).

Microservices as connected containers

Although microservices is just an architectural pattern, today the term is almost synonymous with containers. Whereas people have been building microservice systems long before containers became famous, these systems were difficult to deploy…this is where container platforms like Docker came to the rescue.

The rest of this article focuses on a concrete example of microservices implemented with Docker containers. However, the core knowledge applies to alternative platforms like Kubernetes and Mesos.

Let us dive in.

The art of log collection

In the most basic sense, you log a message when an action occurs at some point in your code, typically using a logging library like Bunyan for Javascript. The library is configured to send logs to whatever destination you want, like stdout, a local file, or a log aggregation service like Splunk.

Each setup has its pros and cons, and an extensive treatment of each could make this article too long. So, let us go over some best practices that will form the foundations of our architecture.

Logs are streams of events continuously flowing. Files are inherently static objects. So it is a mismatch of abstractions to store logs in files. This mismatch manifests as an additional complexity of parsing log files to generate useful insights and dealing with file size and rotation policies.
A microservice should not need to know where its logs are going. The execution environment should handle that. That way, you can change the destination of your logs without modifying every single microservice! [Tip: Your microservices should log to stdout or stderr]
Logging should be plug-n-play. A developer should be able to create a microservice in whatever language or framework they desire, then drop it into the environment, and have logging working without fiddling with any configs.

So lets now look at an architecture that achieves these goals.

The dedicated log shipper architecture

Dedicated log shipper architecture — Uzziah Eyee

There is quite a bit going on in the diagram above, so let us approach it top-down.

Conceptually

We have a couple of microservices each running in a container. The logs of each service are forwarded to a log-driver which eventually sends them to a dedicated log-shipping container. The shipper can manipulate the logs before persisting them in a store. Finally, the developer can query this datastore to visualize and analyze the logs. That’s the main gist.

Though, I am handwaving a bunch of stuff here: how do the containers ‘know’ to send the logs to a log-driver. In fact, what is a log-driver? Also, aren’t we breaking one of the aforementioned best practices–which is that a microservice should not know where its logs are going? Why are we using a dedicated log-shipping container? What is Fluentd, Elasticsearch, and Kibana?

Let us answer these questions.

How logging works in Docker

When the docker daemon runs a container, it sends every event stream from that container to a log-driver. By default, it uses the driver specified in the daemon.json file. However, you can specify a different driver for each container during launch.
On receiving a log stream, the log-driver can do whatever it likes. For instance, the default log-driver json-file persists the logs from each container to a file on the host machine.
The daemon ships with a few log-drivers, but you can add more using plugins. You can see the active log-driver and the installed plugins with the command docker info, then search for “Logging Driver” and “Plugins”.

Aha! Now you understand why it is a best practice for microservices to log to stdout. It makes them very portable because we delegate the responsibility of log routing to the environment, that is the docker daemon and log-driver.

Putting it all together

The docker daemon sends the event stream from each container to the fluentd log-driver which is one of the preinstalled log plugins.
The fluentd log-driver is configured to send the logs to a UDP/TCP address on which the dedicated log-shipping container is listening.
On receiving the logs, the shipper–a container running the fluentd application–parses, aggregates, and sends the logs to an Elasticsearch cluster hosted as a service. Some alternatives to fluentd are Logstash and Filebeat.
Elasticsearch indexes the logs.
The developer then uses Kibana to query Elasticsearch and create cool visualizations from the log data. We expect these visualizations to provide quick insights for issue prevention and resolution.

The critical thing to note here is that we are treating the logs as a stream by channeling it to the final persistent store without storing it intermittently in files.

We use a dedicated log shipping container so we can centralize business logic like obfuscating Personally Identifiable Information (PII) or changing the log format. You want a single source of truth for such logic. Additionally, having everything containerized avoids needing to install a log shipping agent on the host machine.

Finally, we store the logs in Elasticsearch instead of a regular file or database because Elasticsearch is a search engine designed to index and efficiently retrieve large document collections. Kibana is a frontend for Elasticsearch. It can be used to create charts and dashboards from the query results of ES.

There we have it

We’ve examined the problem of getting insights into the operations of microservices in a containerized environment. Then we proceeded to evaluate a few logging strategies and some best practices. Finally, we brought this all together by looking at an actual implementation that solves most of the problems.

Thanks for reading so far, I hope this helps you with your next microservices project. If you think I missed something, please let me know in the comments.

Oh, about the challenge to find the hidden message in the featured illustration. The image shows the “Observer” in the Situation Room of the 51 Pegasi System. The connected microservices are actually stars forming the constellation Libra and Virgo from left to right. These are the zodiac constellations of September —the month this article was published.