consul2dogstats, with Dimensional Tagging

Written by otterley | Published 2017/05/02
Tech Story Tags: devops | metrics | sre | monitoring

TLDRvia the TL;DR App

If you use both Consul and Datadog, you’ll want this.

I’ve written a small program that maintains a list of Consul service counts in your datacenter(s) called consul2dogstats. It periodically polls the Consul service catalog and health endpoints to collect an inventory, then publishes the results to Datadog. It’s Apache 2.0-licensed.

An aside on dimensional tagging

One of Datadog’s most powerful features is dimensional tagging. Briefly, dimensional tags are key-value pairs you can attach to time-series metrics for the purpose of querying and aggregation.

Suppose you wanted to track the request rates of the web servers in your organization. Let’s call the name of the metrichttp_requests. Some useful tags we could apply would be:

  • Server host (e.g., app1, ip-1-2-3-4)
  • Web service (e.g., frontend, api, gatekeeper)
  • Environment (e.g., production, test)
  • Response code range (e.g., 2xx, 3xx, 4xx, 5xx)

In Datadog, you would assign a dimension key to each of those. In the examples above, those might be hostname, http_host, env, and http_response_type.

Once you start tagging metrics, aggregating and filtering them is a piece of cake. You can perform powerful queries to answer questions in real time such as:

  • How many 5xx responses per minute am I serving across all of production?
  • How many 2xx responses am I sending per minute in my production environment for the gatekeeper web service?

This is a tremendous leap in functionality and flexibility over older solutions that have no dimension other than the server host (which itself is a decreasingly useful dimension in cloud environments, since hosts are often ephemeral now and their canonical names meaningless).

Consider a traditional dot-separated form of metric names such as requests.production.frontend.2xx or requests.test.gatekeeper.4xx. These pose some significant challenges for analysts.

First, selecting the metrics to query is an O(n) operation because the metric name is opaque (the system has no semantic awareness of the naming structure you choose). If you’re collecting a lot of metrics (and you should be!), that’s a very expensive operation.

Second, adding new dimensions is a management nightmare, because you have to create a new metric if you want to add a dimension. Suppose we wanted to add a new dimension like shard to our metrics above. In a system that supports dimension tagging out of the box, that’s no problem: any previous graphs will continue to render as before. But if you’re using flat metrics, you’ll start creating new metrics like requests.production.frontend.shard42.2xx and requests.tests.gatekeeper.shard1.4xx. The history of the previous metric data will end once you replace the old metrics with new ones, which creates a discontinuity that will make it more challenging to perform temporal analysis. (You can, of course, continue to submit metrics under the old name in parallel with the new, more granular metrics. But you’ll be double-counting, which could cause analytical errors, and submitting more metrics to the system than necessary, which could cause capacity issues.)

The good news is that even if you prefer managing metrics yourself over using a SaaS provider such as Datadog, there are now open-source metrics systems that support dimensional tagging. There’s no need to suffer with flat metrics anymore.

Dimension anti-patterns

Typically, operational metrics systems are intended to show the real-time state and health of your systems and applications. In order to maintain good performance, it’s important not to make metric dimensions too fine-grained. A good rule of thumb is to ensure that the possible values for a particular key are fewer than the number of servers you have. So tagging metrics by environment, hostname, or function is good. But tagging them by customer or user ID is a very bad idea; it will create so many tables on the backend that performance will suffer terribly.

Put bluntly: Don’t use your operational metrics system as your analytics system. There are lots of great analytics databases and providers out there, including Vertica, Redshift, Clickhouse, and BigQuery.

A promising landscape

Datadog was the first platform, commercial or otherwise, that offered a compelling product that supported dimensional metric tagging. But there are lots more options out there now. Check them out and discover the one that’s best for you:

Commercial

Open-Source

Hacker Noon is how hackers start their afternoons. We’re a part of the @AMIfamily. We are now accepting submissions and happy to discuss advertising & sponsorship opportunities.

To learn more, read our about page, like/message us on Facebook, or simply, tweet/DM @HackerNoon.

If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!


Published by HackerNoon on 2017/05/02