How to Store Data with a Few Million Akka Actors

Using dictionary like data structures is very common in software engineering; storing indexed data, accessing stateful entities by id and so on. There are many ways and technologies to implement this; in this article I’m demonstrating a few ways how Akka actors can be used to store and access indexed data or entities.

I’m going to compare four different approaches to store dictionary like data:

In-memory Scala Map
In-memory Akka actor
Persistent actor cluster singleton
Persistent entity actors with cluster sharding

The benchmark consists of two parts:

we fill a dictionary with about 2.5 million random words
we look up 5 million random words.

In-memory Scala Map

Simple solution to populate a plain Scala map with two and half million random values.

Running time: 5.9 seconds to populate the dictionary, 2.4 second to look up the keys.

Memory: 2.2 MB heap is used, 430 MB after garbage collection.

Use case: this solution is listed only as a reference point, it has very little value as it’s non-persistent and single threaded solution. This benchmark allows us to see how much memory and processing time is required to generate and store the pure data.

In-memory Akka actor

A single stateful actor; different messages used to insert and look up values:

The benchmark app uses Akka streams to populate and to query the state:

Running time: 14.3 seconds to populate the dictionary, 18.7 second to look up the keys.

Memory: 2.3 GB heap is used, 420 MB after garbage collection.

Pros: simple and easy to implement, thread safe

Cons: non-persistent, single threaded message processing

Use case: for simple cache purposes

Persistent cluster singleton actor

Persistent, event-sourced version of the previous in-memory actor, backed with Cassandra database:

The benchmark app is almost identical to the in-memory actor version, only difference is how the singleton actor was created:

Running time: 1665.8 seconds to populate the dictionary, 30.2 seconds to look up the keys.

Memory: 800 MB heap is used, 430 MB after garbage collection.

Pros: easy to implement, persistent, thread safe, fast lookups

Cons: single threaded, can introduce memory bottleneck, slow population time

Use case: centralised naming service

Persistent entity actors with cluster sharding

Persistent, event-sourced entity actor for each dictionary entry. The actor can hold an optional value; None if the entry doesn’t exist, Some(X) if it does:

We need to have the sharding set up, extracting the entityId and shardId from the incoming message:

In this version of benchmark app we can populate the map in parallel:

Running time: 605 seconds to populate the dictionary, 280 second to look up the keys.

Memory: 5 GB heap is used, 4.6GB after garbage collection.

Pros: persistent, multi threaded, fast population time

Cons: slower lookups, higher memory usage, easy to over-engineer

Use case: virtually unlimited number of stateful actors with any type of behaviour

Summary

It’s worth to highlight how lightweight Akka is; in the last benchmark about five million actors were created and they used only 4.6GB memory; that’s less than 1KB per actor.

There’s no silver bullet; always pick the most appropriate approach for solving your problem.

References

Thanks for reading, your comments are appreciated.

One last thing…

If you liked this article, click the💚 below so other people will see it here on Medium. Thank you, Tamas.