Elastic (ELK) Stack Tips and Tricks for Transforming Log Data
If you’re using the Elastic (ELK) Stack and are interested in mapping custom Logstash logs to Elasticsearch, then this post is for you.
The ELK Stack is an acronym for three open source projects: Elasticsearch, Logstash, and Kibana. Together, they form a log management platform.
- Elasticsearch is a search and analytics engine.
- Logstash is a server‑side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to a “stash” like Elasticsearch.
- Kibana lets users visualize data with charts and graphs in Elasticsearch.
Beats came later on and is a lightweight data shipper. The introduction of Beats transformed ELK Stack to Elastic Stack, but that is besides the point.
This article focuses on Grok, which is a feature within Logstash that can transform your logs before they are forwarded to a stash. For our purposes, I will only talk about processing data from Logstash to Elasticsearch.
Grok
Grok is filter within Logstash that is used to parse unstructured data into something structured and queryable. It sits on top of Regular Expression (regex) and uses text patterns to match lines in log files.
As we will see in the following sections, using Grok makes a big difference when it comes to effective log management.
Without Grok your Log Data is Unstructured
A single log line in Kibana.
Without Grok, when logs get sent from Logstash to Elasticsearch and rendered in Kibana, it only appears in the message value.
Querying for meaningful information is difficult in this situation because all of the log data is stored in one key. It would be better if the log messages were organized better.
Log Data
Unstructured
localhost GET /v2/applink/5c2f4bb3e9fda1234edc64d 400 46ms 5bc6e716b5d6cb35fc9687c0
If you take a closer look at the raw data, you can see that it’s actually made up of different parts, each separated by a space delimiter.
For more experienced developers, you can probably guess what each of the parts mean and that it’s a log message from an API call. The representation of each item is outlined below.
Structured
- localhost == environment
- GET == method
- /v2/applink/5c2f4bb3e9fda1234edc64d == url
- 400 == response_status
- 46ms == response_time
- 5bc6e716b5d6cb35fc9687c0 == user_id
As we can see in the structured data, there is an order to unstructured logs. The next step then is to programmatically refine the raw data. This is where Grok shines.
Grok Patterns
Built In
Logstash comes with over a 100 built in patterns for structuring unstructured data. You should definitely take advantage of this when possible for common system logs like apache, linux, haproxy, aws, and so forth.
However, what happens when you have custom logs like the example above? You have to build your own custom Grok pattern.
Custom
It takes trial and error to build your own custom Grok pattern. For me, I used the Grok Debugger and Grok Patterns to figure it out.
Please note that the syntax for Grok patterns is: %{SYNTAX:SEMANTIC}
The first thing I tried doing was going to the Discover tab in Grok Debugger. I thought that it would be great if this tool can auto generate the Grok pattern, but it wasn’t too helpful as it only found two matches.
Grok Debugger ‘Discover’ only matched 2 words
Using this discovery, I began building my own pattern on Grok Debugger using the syntax found on Elastic’s github page.
https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns
After playing around with different syntaxes, I was finally able to structure the log data in the way I wanted to.
Structuring Unstructured Log Data with Grok Debugger
localhost GET /v2/applink/5c2f4bb3e9fda1234edc64d 400 46ms 5bc6e716b5d6cb35fc9687c0
%{WORD:environment} %{WORD:method} %{URIPATH:url} %{NUMBER:response_status} %{WORD:response_time} %{USERNAME:user_id}
{"environment": [["localhost"]],"method": [["GET"]],"url": [["/v2/applink/5c2f4bb3e9fda1234edc64d"]],"response_status": [["400"]],"BASE10NUM": [["400"]],"response_time": [["46ms"]],"user_id": [["5bc6e716b5d6cb35fc9687c0"]]}
With the Grok pattern in hand and the data mapped, the final step is to add it to Logstash.
Update Logstash.conf
On the server that you installed the ELK stack on, navigate to Logstash config.
sudo vi /etc/logstash/conf.d/logstash.conf
Paste in the changes.
input {file {path => "/your_logs/*.log"}}filter{grok {match => { "message" => "%{WORD:environment} %{WORD:method} %{URIPATH:url} %{NUMBER:response_status} %{WORD:response_time} %{USERNAME:user_id}"}}}output {elasticsearch {hosts => [ "localhost:9200" ]}}
After you save the changes, restart Logstash and check its status to make sure that it’s still working.
sudo service logstash restartsudo service logstash status
Lastly, to make sure that the changes take affect, be sure to refresh the Elasticsearch index for Logstash in Kibana!
Refresh the Elasticsearch index for Logstash in Kibana
With Grok your Log Data is Structured!
Grok automatically structures unstructured logs
As we can see in the image above, Grok is able to automatically map log data to Elasticsearch. This makes it easier to manage your logs and to quickly query for information. Instead of digging through log files to debug, you can simply filter by what you’re looking for like environment or url.
Try giving Grok expressions a shot! If you have another way of doing this or you have any problems with examples above, just drop a comment below to let me know.
Thanks for reading — and please follow me here on Medium for more interesting software engineering articles!
Resources
https://www.elastic.co/blog/do-you-grok-grok
https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns