Centralised logging for AWS Lambda, REVISED (2018)

Written by theburningmonk | Published 2018/07/23
Tech Story Tags: aws | aws-lambda | serverless | cloud-computing | cloud

TLDRvia the TL;DR App

First of all, I would like to thank all of you for fol­low­ing and read­ing my con­tent. My post on cen­tralised log­ging for AWS Lamb­da has been viewed more than 20K times by now, so it is clear­ly a chal­lenge that many of you have run into.

In the post, I out­lined an approach of using a Lamb­da func­tion to ship all your Lamb­da logs from Cloud­Watch Logs to a log aggre­ga­tion ser­vice such as Logz.io.

In the demo project, I also includ­ed func­tions to:

  • auto-sub­scribe new log groups to the log-ship­ping func­tion
  • auto-update the reten­tion pol­i­cy of new log groups to X num­ber of days (default is Nev­er Expire which has a long term cost impact)

This approach works well when you start out. How­ev­er, you can run into some seri­ous prob­lems at scale.

Mind the concurrency

When pro­cess­ing Cloud­Watch Logs with a Lamb­da func­tion, you need to be mind­ful of the no. of con­cur­rent exe­cu­tions it cre­ates. Because Cloud­Watch Logs is an asyn­chro­nous event source for Lamb­da.

When you have 100 func­tions run­ning con­cur­rent­ly, they will each push logs to Cloud­Watch Logs. This in turn can trig­ger 100 con­cur­rent exe­cu­tions of the log ship­ping func­tion. Which can poten­tial­ly dou­ble the num­ber of func­tions that are con­cur­rent­ly run­ning in your region. Remem­ber, there is a soft, region­al lim­it of 1000 con­cur­rent exe­cu­tions for all func­tions!

This means your log ship­ping func­tion can cause cas­cade fail­ures through­out your entire appli­ca­tion. Crit­i­cal func­tions can be throt­tled because too many exe­cu­tions are used to push logs out of Cloud­Watch Logs — not a good way to go down ;-)

You can set the Reserved Con­cur­ren­cy for the log ship­ping func­tion, to lim­it its max num­ber of con­cur­rent exe­cu­tions. How­ev­er, you risk los­ing logs when the log ship­ping func­tion is throt­tled.

You can also request a raise to the region­al lim­it and make it so high that you don’t have to wor­ry about throt­tling.

A better approach at scale is to use Kinesis

How­ev­er, I would sug­gest that a bet­ter approach is to stream the logs from Cloud­Watch Logs to a Kine­sis stream first. From there, a Lamb­da func­tion can process the logs and for­ward them on to a log aggre­ga­tion ser­vice.

With this approach, you have con­trol the con­cur­ren­cy of the log ship­ping func­tion. As the num­ber of log events increas­es, you can increase the num­ber of shards in the Kine­sis stream. This would also increase the num­ber of con­cur­rent exe­cu­tions of the log ship­ping func­tion.

Take a look at this repo to see how it works. It has a near­ly iden­ti­cal set up to the demo project for the pre­vi­ous post:

  • a set-retention func­tion that auto­mat­i­cal­ly updates the reten­tion pol­i­cy for new log groups to 7 days
  • a subscribe func­tion auto­mat­i­cal­ly sub­scribes new log groups to a Kine­sis stream
  • a ship-logs-to-logzio func­tion that process­es the log events from the above Kine­sis stream and ships them to Logz.io
  • a process_all script to sub­scribe all exist­ing log groups to the same Kine­sis stream

You should also check out this post to see how you can autoscale Kine­sis streams using Cloud­Watch and Lamb­da.

Hi, my name is Yan Cui. I’m an AWS Serverless Hero and the author of Production-Ready Serverless. I have run production workload at scale in AWS for nearly 10 years and I have been an architect or principal engineer with a variety of industries ranging from banking, e-commerce, sports streaming to mobile gaming. I currently work as an independent consultant focused on AWS and serverless.

You can contact me via Email, Twitter and LinkedIn.

Check out my new course, Complete Guide to AWS Step Functions.

In this course, we’ll cover everything you need to know to use AWS Step Functions service effectively. Including basic concepts, HTTP and event triggers, activities, design patterns and best practices.

Get your copy here.

Come learn about operational BEST PRACTICES for AWS Lambda: CI/CD, testing & debugging functions locally, logging, monitoring, distributed tracing, canary deployments, config management, authentication & authorization, VPC, security, error handling, and more.

You can also get 40% off the face price with the code ytcui.

Get your copy here.


Written by theburningmonk | AWS Serverless Hero. Independent Consultant. Developer Advocate at Lumigo.
Published by HackerNoon on 2018/07/23