How to load test a realtime multiplayer mobile game with AWS Lambda and Akka

Tencent’s Kings of Glory is one of the top grossing games worldwide in 2017 so far.

Over the last 12 months, we have seen a number of team-based multiplayer games hit the market as companies look to replicate the success of Tencent’s King of Glory (known as Arena of Valor in the west) which is one of the top grossing games in the world in 2017.

Even our partners Supercell has recently dipped into the genre with Brawl Stars, which offers a different take on the traditional MOBA (Multiplayer-Online-Battle-Arena) formula.

Supercell’s Brawl Stars offers a different experience to the traditional MOBA format, it is built with mobile in mind and prefers simple controls & maps, as well as shorter matches.

Here at Space Ape Games, we have been exploring ideas for a competitive multiplayer game, which is still in prototype so I can’t talk about it here. However, I can talk about how we use AWS Lambda to load test our homegrown networking stack.

Why Lambda?

The traditional approach of using EC2 servers to drive the load testing has several problems:

slow to start : any sizeable load test would require many EC2 instances to generate the desired load. Since it costs you to keep these EC2 instances around, it’s likely that you’ll only spawn them when you need to run a load test. Which means there’s a 10–15 mins lead time before every test just to wait for the EC2 instances to be ready.
wastage : when the load test is short-lived (say, < 1 hour) you can incur a lot of wastage because EC2 instances are billed by the hour with a minimum charge for one hour (per-second billing is coming to non-Windows EC2 instances in Oct 2017, which would address this problem).
hard to deploy updates : to update the load test code itself (perhaps to introduce new behaviours to bot players), you need to invest in the infrastructure for updating the load test code on the running EC2 instances. Whilst this doesn’t have to be difficult, after all, you probably already have a similar infrastructure in place for your game servers. Nonetheless, it’s yet another distraction that I would happily avoid.

AWS Lambda addresses all of these problems.

It does introduce its own limitations — especially the 5 min execution time limit. However, as I have written before, you can work around this limit by writing your Lambda function as a recursive function and taking advantage of container reuse to persist local state from one invocation to the next.

I’m a big fan of the work the Nordstrom guys have done with the serverless-artillery project. Unfortunately we’re not able to use it here because the game (the client app written in Unity3D) converses with the multiplayer server in a custom protocol via TCP, and in the future that conversation would happen over Reliable UDP too.

Nordstrom/serverless-artillery_Combine serverless with artillery and you get serverless-artillery for instant, cheap, and easy performance testing at…_github.com

Akka

Our multiplayer server is written in Scala with the Akka framework. To help us optimize our implementation we collect lots of metrics about the Akka system as well as the JVM — GC, heap, CPU usage, memory usage, etc.

The Kamon framework is a big help here, it made quick work of getting us insight into the running of the Akka system — no. of actors, no. of messages, how much time a message spends waiting in the mailbox, how much time we spend processing each message, etc.

All of these data points are sent to Wavefront, via Telegraf.

We collect lots of metrics about the Akka system and the JVM.

We also have a standalone Akka-based load test client that can simulate many concurrent players. Each player is modelled as an actor, which simulates the behaviour of the Unity3D game client during a match:

find a multiplayer match
connect to the multiplayer server and authenticate itself
play a 4 minute match, sending inputs at 15 times a second
report “client side” telemetries so we can collect the RTT (Round-Trip Time) as experienced by the client, and use these telemetries as a qualitative measure for our networking stack

In the load test client, we use the t-digest algorithm to minimise the memory footprint required to track the RTTs during a match. This allows us to simulate more concurrent players in a memory-constrained environment such as a Lambda function.

AWS Lambda + Akka

We can run the load test client inside a Java8 Lambda function and simulate 100 players per invocation. To simulate X concurrent players, we can create X/100 concurrent executions of the function via SNS (which has an one-invocation-per-message policy).

To create a gradual ramp up in load, a recursive Orchestrator function will gradually dial up the no. of current executions by publishing more messages into SNS, each triggering a new recursive load test client function.

A LoadTest function that is triggered by API Gateway allows us to easily kick off a load test from a Jenkins pipeline.

Using the push-pull pattern (see this post for detail), we can track the progress of all the concurrent load test client functions. When they have all finished simulating their matches, we’ll kick off the Aggregator function.

The Aggregator function would collect the RTT metrics published by the load test clients and produce a report detailing the various percentile RTTs.

{"loadTestId": "62db5790-da53-4b49-b673-0f60e891252a","status": "completed","successful": 43,"failed": 2,"metrics": {"client-interval": {"count": 7430209,"min": 0,"max": 140,"percentile80": 70.000000193967,"percentile90": 70.00001559848,"percentile99": 71.000000496589,"percentile99point9": 80.000690623146,"percentile99point99": 86.123610689566},"RTT": {"count": 744339,"min": 70,"max": 320,"percentile80": 134.94761466541,"percentile90": 142.64720935496,"percentile99": 155.30086042676,"percentile99point9": 164.46137375328,"percentile99point99": 175.90215268392}}}

Like what you’re reading but want more help? I’m happy to offer my services as an independent consultant and help you with your serverless project — architecture reviews, code reviews, building proof-of-concepts, or offer advice on leading practices and tools.

I’m based in London, UK and currently the only UK-based AWS Serverless Hero. I have nearly 10 years of experience with running production workloads in AWS at scale. I operate predominantly in the UK but I’m open to travelling for engagements that are longer than a week. To see how we might be able to work together, tell me more about the problems you are trying to solve here.

I can also run an in-house workshops to help you get production-ready with your serverless architecture. You can find out more about the two-day workshop here, which takes you from the basics of AWS Lambda all the way through to common operational patterns for log aggregation, distribution tracing and security best practices.

If you prefer to study at your own pace, then you can also find all the same content of the workshop as a video course I have produced for Manning. We will cover topics including:

authentication & authorization with API Gateway & Cognito
testing & running functions locally
CI/CD
log aggregation
monitoring best practices
distributed tracing with X-Ray
tracking correlation IDs
performance & cost optimization
error handling
config management
canary deployment
VPC
security
leading practices for Lambda, Kinesis, and API Gateway

You can also get 40% off the face price with the code ytcui. Hurry though, this discount is only available while we’re in Manning’s Early Access Program (MEAP).