Logging Without Tears

Leobit CTO (a handsome guy you see above😊) put his heart and soul into this article to teach you how to make logging great again. Keep reading and you will learn standard practices that everyone should take advantage of when dealing with logging. The article will give you a rough idea of what path you have to follow if you should deal with setting up logging principles on your project. Enjoy!

Log purpose

Problem statement 1

Hey %USERNAME%, yesterday %ANY_FEATURE_NAME% didn't work. There was absolutely no data displayed. The loader icon did not disappear for a few minutes. I had to cancel the demo, we can not proceed with this solution, we are losing customers. Fix this ASAP. URGENT!!!

Sometimes something goes wrong, and we need to fix it, but in many cases, it will be unreproducible in your local environment. Moreover, you can have no access to production DB or event dumps (to reproduce it locally with dumped data). We definitely need logs here to figure out what happened. Let’s set up a logging system for a non-complex web application, but first, we need to understand what logs are usually used for.

Logging in without objection is a necessary thing, which is usually used for:

understand what the system is doing right now;
trace chronology of actions performed by the system;
information about various events in the system (errors, abnormal situations, etc.)
other.

Mandatory features of a well-designed logger should be:

support of logging levels and filtering on these levels;
rotation of logs (by date and size, for example);
the ability to write logs, not only to files (for example, in the database, email, etc.)
dynamic reconfiguration of logging (tracking the config file and changes in runtime according to changes).

When to log?

All cases when we need some logs can be actually covered by two scenarios:

something is happening, and we need to know what
something happened, and we need to know what

It is simpler with the first one because we can most likely add some logs ad-hoc (using different techniques) to understand what is happening. A bit harder with the second one because you cannot add more logs if something already happened.

So, in conclusion, try to log as much as possible. It is always easier to find something in a big heap if it is there than in a small one if it is not. At the same time, try to keep a balance. Overhead of logs usually decreases your app performance, be aware of it. And definitely use log levels.

Log levels

Depending on the logger, you can select the following levels of logging:

Debug (the lowest level required for debugging);
Info (informing about various actions of the system, steps of a process, for example);
Warn (non-critical bug in the system, or remarks to the developers in connection with the behavior of the program);
Error (an error that needs attention and solution, some failure);
Fatal (a very critical error that causes the system to crash and malfunction, usually to stop the process).

logger.Log(LogLevel.Debug, "Create a new instance of user");
var user = _userService.CreateNewUser();
logger.Log(LogLevel.Debug, "User created")

try 
{
  logger.Log(LogLevel.Information, "Setting up scope related permissions");
  _permissionService.ConfigurePermissions(identity.Scopes, user);
}
catch (Exception ex)
{
  logger.Log(LogLevel.Fatal, "Permissions could not be set", ex);
  ...
}

In case your logger supports dynamic reconfiguration, you will be able to switch the default log level from Info to Debug event on the running environment to have as much information as possible and change it back to Info after you figure out the issue.

What to log?

Any valuable log record should contain at least three things.

[SystemInfo] [Context] [Message]

SystemInfo - timestamp, process id, thread id, or any other system tag
Context - calling context can be shared for multiple log records, usually the name of caller module/service/method
Message - log record message

Most commonly used loggers will provide SystemInfo and Context parts out of the box. So you can care only about the Message.

Information to log:

Errors
Requests and responses
Valuable system information and events

Usually, it is necessary to bound request and response log records. The X-Request-ID HTTP header is often used to link these pairs. Such a header can be generated by the client, or it can be generated on the server-side. By adding it to the context of each line of the log, it will be possible with a slight movement of the hand to find all messages that have arisen within the framework of a specific request. And in the case of distributed systems, the header can be passed down the chain of network calls.

Centralized Log Storage

Problem statement 2

Hey %USERNAME%, the payment did not succeed for John Doe yesterday order id is XYZ123. I know you have access to logs. Can you check what happened?

Our services:

Public API gateway
Authorization
Product service
Customer service
Order service
Pricing service
Delivery Service
Memes service

Good luck to grep the necessary log record in one of these log storages (whatever these are a file, DB, or log storage). The solution is pretty obvious and straightforward. Here we need to set up centralized log storage. Ultimately, to have a single entry point with convenient analysis, so even Support can use the logging tool to some simple cases without developers’ involvement.

CloudWatch

In case you are using AWS as your hosting provider, you can try CloudWatch for your logs. Simply need to create IAM user with CloudWatch write access, create an access key/secret pair. And add AWS seq to your logger instance. In case of C# and Serilog it looks like

profileService.RegisterProfile("PROFILE_NAME", new 
{
    AccessKey = "AccessKey",
    SecretKey = "SecretKey",
});

var configuration = new Configuration
{
    "LogGroup": "YOUR_LOG_GROUP",
    "Region": "eu-central-1",
}

loggerBuilder
      .WriteTo.AWSSeriLog(configuration)
      .CreateLogger();

Now all logger.Log("Something happened") will be pre-aggregated and send to CloudWatch as a batch request on a scheduled basis.

LogAnalytics or AppInsights

In case you are using Azure as a hosting provider, create Azure Log Workspace and access token key for it.

loggerBuilder
    .WriteTo.AzureAnalytics(workspaceId: "WORKSPACE_ID_HERE", authenticationId: "SECRET_KEY_HERE")
    .CreateLogger();

Or you can use Azure Application Insights by merely connecting the necessary package (for .Net it is Microsoft.Extensions.Logging.ApplicationInsights).

Result:

ELK

"ELK" is the acronym for three open source projects: Elasticsearch, Logstash, and Kibana. Elasticsearch is a search and analytics engine. Logstash is a server‑side data processing pipeline that simultaneously ingests data from multiple sources, transforms it, and then sends it to a "stash" like Elasticsearch. Kibana lets users visualize data with charts and graphs in Elasticsearch. We will use Logz.io to simplify the setup, but the same things can be done with pure ELK as well.

var configuration = new LogzioOptions
{
    DataCenterSubDomain = "listener-eu"
});

loggerBuilder
    .WriteTo.LogzIo("YOUR_SECRET", configuration)
    .CreateLogger()

By launching the application, we can observe our messages not only in the console but also in Kibana.

Structure logging

Default text messages in logs are not widely used because they can be better. We shouldn't be working with flat text messages but with objects. Thanks to such logs, we can quickly build representations of our messages in different projections and conduct analytics.

Structural logging complements this practice by providing developers with an easy way to annotate strings with additional data. The term "structural" has a double meaning: it can mean a natively structured data format like JSON, but it often means that log lines are appended with key-value pairs (sometimes called logfmt). The additional structure allows developers to easily enrich strings with the information they need, rather than embedding it in the message texts themselves.

Briefly, the main idea is to use

[Saturday, 20-Feb-21 09:37:21 UTC] [77238] OrderService: User { id: 12345, name: "John", age: 22 } ordered a product { id: 322, name: "Whiskey", price: 228 }

[Saturday, 20-Feb-21 09:37:21 UTC] [77238] OrderService: User userId=12345 userName=John userAge=22 ordered a product productId=322 productName=Whiskey productPrice=228

instead of

[Saturday, 20-Feb-21 09:37:21 UTC] [77238] OrderService: User 12345 ordered a product 322

Having a data structure also makes the generated logs machine-readable (the key-value convention is used as a compromise in which both machines and humans can read data). This makes them suitable for various logging tools, many of which can request records from the production environment in near real-time.

All progressive data storages provide the ability to quickly search necessary log records by applying filters in SQL style or event by making a few clicks on the properties you are interested in. So you will be able to filter all logs where user ages were more than 20 or so.

This kind of logs usually contain redundant data, which allows you to extract almost anything from them - even information that no one thought about. For example, you can find out which HTTP path is the most requested. And if the API returns errors with a code of 500 (internal server error), you can find out the duration of the failed requests to accurately assess whether database timeouts cause this.

Maximum efficiency

Problem statement 3

Hey %USER_NAME%, I have just checked logs for recently failed order. There are about 500 records. It looks like the user used system quite intensively. Can you help to figure out steps to reproduce this issue?

To make our life easier one more tip can be used. We have to keep an end-to-end correlation between the logs of independent services so that you can easily view all the activities that were caused by a specific request from the client.

Progressive Web Frameworks have an excellent ability to pass correlation context in HTTP headers, so if your applications use direct HTTP calls for interservice communication, you can take advantage of this boxed functionality. However, suppose your backend architecture involves interaction through a message broker (RabbitMQ, Kafka, etc.). In that case, you still need to attend to passing the correlation context through these messages yourself.

Tracing and Monitoring

By tracing requests, we understand everything that happens inside, how the request lives further inside our application. We just need to build call chains, see data inside requests, errors in call chains, response times, the number of RPS.

There are great tools for tracing are Jaeger or Zipkin. You to use OpenTracing, which they both support.

Usually, tracing information can be collected from three sources:

If you use load balancers, parse the logs from them and send them to Jaeger.
Service mesh, which knows how to collect and send traces by itself (i.e., Istio).
The services themselves, if they receive addresses through Service Discovery system or communicate directly. In this case, the trace from the services can be sent directly to Jaeger.

To simplify our tracing experience, we can generate some kind of token that will bound all request lifecycle stages into one chain. So we need to generate this token, let’s call it a correlation identifier, as close as possible to the start of the activity, i.e., on the gateway or directly on the client (mobile or web). Since we are dealing with a backend application today, we will simply indicate the requirement for a mandatory "X-Correlation-ID" header in all requests to the web API.

Note: X-Correlation-ID IS NOT CONFIRMED TRACING HEADER BY W3C

The next step is to include received CorrelationId into your request handling pipeline until returning a response. Since the application that receives the events from the message broker queue has access to all the properties, we can use the obtained correlation identifier inside the consumer application and pass it further along the entire call chain.

As a result, our logs began to contain CorrelationId not only within one service but also when interacting with other applications services.

As a next developments stage you can categorize your logs and put them to more powerful systems

Errors/warnings --> Sentry
Tracing --> Jaeger
Execution time and performance --> Prometheus

On the output, you will be able to set up notifications, alerts, and any automated rules for monitoring something and taking actions (restart instance by webhook on fatal errors or send email to responsible engineer).

Build timeline diagrams to find out bottlenecks

Monitor the performance

GDPR and Privacy Policy

Do not forget that private information should be private, DO NOT LOG OR STORE:

Credit cards information
Cookies
Login and passwords
Access tokens

Conclusion

At least some logs are better than none.

Logs in raw view are now so useful as metrics.

A single CorelationiId will help you find problems more efficiently in a service that consists of multiple applications.

Structured logs will allow you to query them more conveniently than ordinary text logs.

And don't forget to follow the schedules and monitor.

We hope you enjoyed reading this article and share our opinion on the importance of logging. If you are interested in more topics on software development, feel free to visit Leobit Blog or contact our technical team to chat about logging :)