Managing a multi-environment serverless architecture in AWS

At 2PAx we’re in the process of migrating our REST API from a traditional monolithic application to a Serverless architecture. Before taking this step we wanted to know how it would affect local development and what would be required in order to maintain our current deployment strategy involving multiple environments. This article is a summary of our investigation including the approaches we tried and the obstacles we met along the way.

Note: We will be using the term ‘environment’ and not ‘stage’ when referring to our logical develop, staging and production setups in order to avoid confusion with AWS resource naming.

Objective

Our pipeline contains three different environments — develop, staging and production. Source code commits merged into master should be deployed automatically to the develop environment, whereas transitioning to the other two environments requires manual approval. This strategy was easy to achieve with the single app backend we used until now (a single compiled Go binary) but poses a few challenges with a serverless architecture as we now need to deploy a number of different resources.

The three environments in our pipeline: develop, staging and production.

Serverless web applications on AWS

Before we get into the details of deployment let’s briefly look at the components involved in a typical serverless setup and what concepts AWS provides to handle multiple environments.

Client requests are routed and validated by AWS Gateway before being handled by AWS Lambda.

At a basic level, an incoming client request will be routed through API Gateway, a Lambda authoriser function and finally a Lambda endpoint handler.

API Gateway takes care of validating request parameters and body, caching, rate-limiting, etc. and can be conveniently configured using an OpenAPI spec, formerly known as Swagger. This reduces the amount of boilerplate code and helps keeping the actual logic handling the request simple.

The endpoint handler should not have to worry about authenticating or authorising the user, which is why both are handled by an authoriser Lambda function, acting as an auth middleware (validating JSON Web Tokens in our case) and returning an IAM policy. Finally, the validated and authorised request is handled by another Lambda function and its result mapped by API Gateway before returning the response to the client.

As opposed to the simplified example above our API features multiple endpoints. Our current approach is to group closely related endpoints in the same endpoint handler, akin to small service handlers, rather than using one Lambda function per endpoint. Depending on your requirements or preferences you may end up using a different approach.

The entire stack is defined in a template using SAM, short for Serverless Application Model, an extension to CloudFormation, which is deployed (by CodePipeline in our case) into the aforementioned development, staging and production environments.

API Gateway

AWS offers powerful ways to handle different versions of the same API Gateway resource through the use of ‘stages’. Here’s what the documentation has to say:

Stages enable robust version control of your API. For example, you can deploy an API to a test stage and a prod stage, and use the test stage as a test build and use the prod stage as a stable build. After the updates pass the test, you can promote the test stage to the prod stage. The promotion can be done by redeploying the API to the prod stage or updating a stage variable value from the stage name of test to that of prod.

The stage variables mentioned in the quote above allow you to use the API with different configurations for every stage. For instance you could use different Lambda function ARNs or pass configuration values to the functions. Again the official documentation provides further detail.

Lambda

Lambda functions themselves have their own concept of versioning. Another look at the official documentation reveals the following:

Aliases enable you to abstract the process of promoting new Lambda function versions into production from the mapping of the Lambda function version and its event source.

…

In contrast, instead of specifying the function ARN, suppose that you specify an alias ARN in the notification configuration (for example, PROD alias ARN). As you promote new versions of your Lambda function into production, you only need to update the PROD alias to point to the latest stable version. You don’t need to update the notification configuration in Amazon S3.

With aliases you gain fine grained control over promoting new Lambda versions to specific stages. One caveat worth noting — since Lambda functions are immutable once published you may want to look into how this affects environment specific configurations such as database strings and other parameters, see these articles.

CloudFormation

CloudFormation is an infrastructure management tool and accompanying spec that allow you to define a group of AWS resources grouped into a ‘stack’. Resources are defined in templates written in json or yaml and the cloudformation tool allows the infrastructure to be created, modified via change sets and destroyed.

The package and deploy commands allow you to prepare local handler payloads and enact stack updates.

[package](https://docs.aws.amazon.com/cli/latest/reference/cloudformation/package.html) will parse the template file (yaml or json), find the functions with a codeUri that points to a local filesystem handler, package and upload them to S3, and then output a packaged template where the codeUris now point to S3 artefacts.
[deploy](https://docs.aws.amazon.com/cli/latest/reference/cloudformation/deploy/index.html) uploads your packaged template to CloudFormation, creates a change set and executes it. This will attempt to migrate your target stack into a state that matches the template that you provided, including the latest version of your Lambda handlers which have been packaged as S3 artifacts.

If everything goes well your stack will be updated to the the exact specification of the packaged template that you provided. In case something goes wrong CloudFormation will rollback all changes and revert the stack to the previous state.

If you’re interested in a similar tool that supports multiple providers see Terraform.

SAM

Serverless Application Model is an attempt to simplify the definition of serverless applications by extending the CloudFormation specification. It adds three new resource types:

AWS::Serverless::Function
AWS::Serverless::Api
AWS::Serverless::SimpleTable

None of the above are new AWS primitives but rather wrappers around existing CloudFormation resources.

SAM aims to abstract away the complexity/verbosity of having to define your own API Gateway stages, deployments, permissions, poles, etc. But given that it’s a new extension these abstractions may leak when you don’t expect it, or, conversely, seem too opaque when you do need more control.

Control over API Gateway stages is limited.
API Gateway Usage Plan integration requires referencing a stage name which depends on internal naming convention.

Fortunately there is work underway to introduce a series of important features including first class support for custom authorisers, CORS, and usage plans.

Local development environment

To run and test SAM based applications locally awslabs released sam local, a CLI that can be used to invoke Lambda functions directly or start a local mock API Gateway from a SAM template which will invoke your functions to handle incoming requests. It does this by executing your local handlers in Docker containers which mimic the real Lambda execution environment. In case you were wondering, it does also come with support for the recently announced official support for Go on AWS Lambda.

The tool supports hot reloading, although in the case of Go you still need to recompile the binary yourself, and must remember to target Linux setting GOOS=linux.

Here is how you start a local API Gateway:

sam local start-api --template api-template.yaml

At the time of writing, sam local does come with a few limitations, namely:

missing support for custom authorisers, hopefully that will change after SAM introduces first class support for authorisers.
bug preventing external OpenAPI yaml files from working (although JSON seems to be working according to multiple user reports)

Single stack vs multi stack

People are still finding out how to best use these tools in the real world, as evidenced by the issues where users ask for infrastructure modelling advice. One example of a contentious area is the management of different environments, such as develop, staging, production, etc.

There are two general approaches to this problem, using a single stack or multiple stacks.

API Gateway and Lambda require different configurations in single and multi stack setups.

The single stack option shares its API Gateway and Lambda functions across all environments, leveraging API stages, stage variables and Lambda aliases, while the multi-stack approach uses one stack per environment, each with its own resources, bypassing the indirection of API stages and Lambda aliases in order to separate environments.

Initially we looked into the single stack approach, which seemed more idiomatic because it made full use of the concepts that API Gateway and Lambda provide us with.

Unfortunately in practice this didn’t seem to work so well. Multiple stage support in SAM is still unclear and quirky, it seems difficult to manage multiple API Gateway stages and Lambda aliases in a single template neatly. Also we realised that your AWS resources were extremely coupled across environments, not simply replicated.

This drove us to do further research, looking beyond the official documentation.

We came across this talk by Chris Munns, an AWS Serverless developer advocate, who recommends to use a single stack if you’re a small team and are just starting out and a multi stack if you have a more complex setup with multiple teams, stringent permissions requirements or just prefer better separation of resources.

Lambda engineer Jacob Fuss on the other hand is more direct in his endorsement of multi stack:

I do not recommend you (or anyone) to use aliases for various environments. The biggest concern I have with this is that you are running the risk of impacting prod with a change to test or dev. You are creating a bigger blast radius in the event something goes wrong with a deploy. My other concern would be around security of your functions. You may have to add credentials or policies specific to dev or test which will or could be replicated in prod. My suggestion is to split out dev, test, and prod into separate CloudFormation stacks so that each environments in isolated for each other. You then only have to manage the one CloudFormation template and can deploy it through you CI/CD system at an environment level. You will still only be managing one Lambda function (through SAM) but this setup will reduce blast radius for deployments and isolate your different environment’s functions and resources.

In the end we decided to go for a multi stack approach, one per environment, managed via a single template.

The key to a multi stack approach is to reuse the same SAM template and rely on template parameters to target each environment. This ensures that the stacks look exactly the same across environments.

The main apparent drawback is the disconnect between the Lambda versions across different environments. For example, even though the same exact code might be executed in the staging and production stacks the actual Lambdas will be different resources in AWS, with different versions. We only know they are executing the same code because we used the same packaged template to deploy to both environments, and that template pointed to the same artifacts in S3, and created Lambda versions with the same code on both stacks.

Also the template may end up becoming more complex in case we want to vary resource properties per environment, e.g having a different size RDS instance.

Putting it all together

Having decided on a multi stack approach our CI setup remained pretty straightforward: we use CodePipeline to automatically take the latest commit on master, run unit tests, compile handlers and deploy the new versions to the develop stack before manually approving deployments to staging and production stacks.

The build stage uses CodeBuild to test and build the handlers, and to execute the aws cloudformation package command, which generates the packaged template:

aws cloudformation package --template-file stack-template.yaml --s3-bucket <s3-bucket> --output-template-file packaged-template.yaml

The packaged template is then passed on to the next stages where it is used by the CloudFormation integration to deploy to each environment, providing the StageName parameter via parameter-overrides:

aws cloudformation deploy --template-file packaged-template.yaml --stack-name <StackDev|StackStaging|StackProd> --capabilities CAPABILITY_NAMED_IAM --parameter-overrides StageName=<Dev|Staging|Prod>

Conclusion

After spending some time familiarising ourselves with more AWS concepts than we could ever have wished for, attempting a single stack approach with SAM, and browsing numerous GitHub issues, blog posts and talks, we eventually decided that the multi stack approach was the best way for us to reach our multi environment target. We hope this write up will help others facing similar questions. In the meantime we’ll keep an eye on the issues mentioned above.

What we didn’t cover

Integration testing via a dedicated stack, as part of our CI pipeline.
Traffic shifting for deployments via Lambda aliases. Note that aliases are still a good way to control deployments within an environment, just not so much for environment separation.

This investigation was carried out with Christian Klotz at 2PAx (a startup that aims to revolutionise how restaurants allocate covers). Thanks to Christian for helping with this article.

References

Serverless Computinghttps://en.wikipedia.org/wiki/Serverless_computing
AWS Serverless Application Modelhttps://github.com/awslabs/serverless-application-model
AWS CloudFormation documentationhttps://aws.amazon.com/documentation/cloudformation
AWS CodePipeline documentationhttps://aws.amazon.com/documentation/codepipeline
AWS API Gateway Stages documentationhttps://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-deploy-api.html
AWS API Gateway Stage Variables documentationhttps://docs.aws.amazon.com/apigateway/latest/developerguide/stage-variables.html
AWS Lambda Versioning and Aliases documentationhttps://docs.aws.amazon.com/lambda/latest/dg/versioning-aliases.html
AWS CloudFormation Package documentationhttps://docs.aws.amazon.com/cli/latest/reference/cloudformation/package.html
AWS CloudFormation Deploy documentationhttps://docs.aws.amazon.com/cli/latest/reference/cloudformation/deploy/index.html
AWS Labs — SAM Localhttps://github.com/awslabs/aws-sam-local
Announcing Go support for AWS Lambdahttps://aws.amazon.com/blogs/compute/announcing-go-support-for-aws-lambda
Configuring Lambda functions ttps://www.concurrencylabs.com/blog/configure-your-lambda-function-like-a-champ-sail-smoothly
Configuring Lambda functions using Systems Manager Parameter Storehttps://hackernoon.com/you-should-use-ssm-parameter-store-over-lambda-env-variables-5197fc6ea45b
Manage secrets in AWS using Systems Manager Parameter Storehttps://segment.com/blog/the-right-way-to-manage-secrets
Serverless Architecture Patterns and Best Practices by Chris Munnshttps://youtu.be/_mB1JVlhScs?t=1306
@jfuss comment on using multiple stackshttps://github.com/awslabs/serverless-application-model/issues/220#issuecomment-349054599
SAM issue #32, usage plans require stage name references to follow internal naming conventionhttps://github.com/awslabs/serverless-application-model/issues/32#issuecomment-286334441
SAM issue #191, creates default stage called “Stage”https://github.com/awslabs/serverless-application-model/issues/191
SAM issue #198, discussion on mutliple stages setup with SAMhttps://github.com/awslabs/serverless-application-model/issues/198
SAM issue #248, missing support for custom authorisershttps://github.com/awslabs/serverless-application-model/issues/248
SAM Local issue #275, bug handling external OpenAPI yaml spechttps://github.com/awslabs/aws-sam-local/issues/275