Five Guidelines for Robust Logging

Written by rohantiwari | Published 2021/09/25
Tech Story Tags: logging | security | microservices | debugging | site-reliability-engineering | software-engineering | scalability | robust-logging

TLDRThe intent for logging standards is to ensure that software developers working in different parts of the codebase on projects that span across multiple teams follow consistent guidelines. Logging approaches can differ from team to team within an organization but some best practices, if followed, will standardize the logging, and ensure good habits when developing micro-services making it easy to debug issues even for somebody who is not completely familiar with the codebase.via the TL;DR App

Introduction

Logging needs special consideration and careful attention during micro-service development. Logs can provide comprehensive information about states, errors, and performance during a micro-service’s execution flow as it serves user requests. It is important to instrument code to add logging capabilities that can trace the application's execution. Adequate instrumentation is very important, and it should be a required coding standard. Logs should be ‘enough’ to give insights into the system without over-whelming the developers and the system.

Guidelines

Choose the Right Log Level

Using the appropriate logging level makes logs easy to read. Log levels allow the developer to filter the levels based on the requirements. A developer investigating an issue will be interested in ERROR level logs whereas a developer making sure the business logic is working as expected, will be interested in INFO logs.

Logging APIs use different logging levels to indicate severity but broadly speaking the following log levels can be used as a guideline:

INFO: Normal logging of the operations, minimal server-side logging to ensure the application code is functioning normally. Keeping an eye on events through the system; need to know what is going on now with business logic.

ERROR: Some unexpected run-time error which should not occur during normal / healthy operations. This is not what our users expected; what went wrong in our business logic.

WARN: Some unexpected error that can occur occasionally even during normal operations

If this happens during normal operations, it doesn’t seem right.

DEBUG: Details related to application which can help with deeper analysis. Additional / detailed information will be useful in debugging any issue at a later time.

FATAL: Catastrophic / fatal error which causes termination of application code. This should never happen; what made our application crash and / or become unrecoverable.

Efficiency

  1. Remove unnecessary logs: logging is very easy to implement but once developers add logs temporary logs or additional logs, don’t forget to remove them. Avoid logging redundant fields / information: if there are other fields in a different log line that can help correlate logs, avoid redundant information.
  2. Log verbosity can be tailored: depending on your audience, logs can be made concise or verbose. E.g. if target audience is developers/DevOps/SRE, then event log time can be abbreviated like eT=2021-09-12T13:00:00:000 whereas if the audience is business analysts/non-technical, then use eventTime=Sunday, 12th September, 2021 at 13:00 UTC.
  3. Consider your space/cost constraints: Excessive logging can generate large log files and can incur large recurring costs to the business. Java full stack traces can take up considerable payload space. Instead of the entire stack trace, consider logging error code/status/message in error handling code.
  4. Logging in a loop: Do not log the result of every iteration of a loop (especially for large loops). Consider logging a summary at the end of the operation.

Performance

  1. Correct use of a logging framework should have none to almost negligible impact on application performance since a logging framework should consume minimal resources.
  2. Generating log messages should not slow down the application. Using toString() method invocation on complex objects is not recommended. String operations can be surprisingly expensive, so this optimization is important.
  3. Ensure that the log statements do not cause failures themselves e.g. check that object is not null when using toString() in a log.
  4. Logging frameworks offer settings to show the class, method, and line number that generated the log output. Since this is an expensive operation to produce this setting, it should be switched off in production.

Security and Privacy

  1. Avoid logging Sensitive, Confidential or Personally Identifiable Information (PII): Do not log SSN, credit card numbers, phone numbers, addresses.
  2. Avoid logging Authentication information: Do not log any credentials, keys, access tokens etc. that was used to authenticate or authorize the user.
  3. Avoid Logging User Generated Content (UGC): Do not log chat message contents, email contents, etc. which can contain confidential data.
  4. Sanitize data: Consider using techniques like hashing, scrambling or pseudonymization to remove any association of the data to a particular user. Log data cannot be traced to a particular user.

Quality and Measurement

  1. Code Review: Develop logging standards for the team and carefully inspect log lines during code review process.
  2. Pre-production Environment Test: If there is a staging or a pre-production environment, check newly added or updated logs (level, information, etc.) before deploying the build in a production environment.
  3. Monitor what you log: Having simple mechanisms and tools to allow the developers to monitor services from a logging point of view will inculcate good habits: dashboards to log daily/weekly/monthly log size by micro-service. Dashboards to count the number of logs per level. Trends for size of items logged.

Conclusion

Logging is a very powerful tool that developers can use to gain insights into their micro-services, and it also serves as an important diagnostic tool. Logging at the appropriate level with enough data and context is equivalent to placing cameras at various spots in the code. Logs are indispensable when it comes to debugging issues and gauging system performance.

Reference

https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html


Written by rohantiwari | Rohan has 10+ years of experience building large scale distributed systems, developer frameworks and bigdata platforms.
Published by HackerNoon on 2021/09/25