Metaparticle/Storage

Simple persistence for distributed NodeJS apps

As excited as a I’ve been about the mainstreaming of containerized applications, and container orchestration. I’m honestly even more excited about democratizing distributed systems to reach new audiences of developers. I’ve talked about this topic several times, but today I’m excited to discuss Metaparticle/Storage a new library for implicit persistence in NodeJS.

Metaparticle/Storage makes persistence simple by making it implicit and automatic. Instead of explicitly making calls to storage infrastructure, you simply assign to local variables and the library takes care of the underlying storage management. Instead of worrying about parallel, replicated servers and data races, Metaparticle/Storage automatically handles conflict detection, rollback and resolution.

Rather than dive into the details of the library, I thought instead I would walk you through the reasons for it’s existence and the problems it’s trying to solve.

An example: a request counting server

Throughout this discussion we’ll focus on a simple server that counts and reports the number of requests it receives. The simplest version of such a server can be written in a few lines of Javascript:

// Simple HTTP Server example, keeps track of the number// of requests and reports back over HTTPvar http = require('http');

var count = 0;var server = http.createServer((request, response) => {count++;var suffix = (count == 1 ? ' request.' : ' requests.');response.end('There have been ' + count + suffix);});

server.listen(8090, (err) => {if (err) {console.log('error starting server', err)}console.log('server is listening on http://localhost:8090')});

This server is indeed quite simple, and it does keep track of the number of requests served, until the server crashes or needs to be restarted. Since the count is simply stored in memory, it is lost whenever the process dies. To preserve it, we need some sort of persistence.

Adding Persistence with Redis

To solve this, we’ll integrate the Redis key-value store into our application. There’s nothing magical about Redis, other storage layers are equally useful, and look similar in code.

// Simple HTTP Server example, keeps track of the number// of requests and reports back over HTTPvar http = require('http');var redis = require('node-redis-client');

var count = 0;

var host = process.env['REDIS_HOST'];var opts = {host: host};var client = new redis(opts);client.on('connect', function () {log.info('connected');});

server.listen(8090, (err) => {if (err) {console.log('error starting server', err)}

console.log('server is listening on http://localhost:8090');});

There are several things to note from adding persistence into our server:

First, the code has grown nearly 2x in size.

Second, and more worryingly, the introduction of a persistent store means that our code now contains explicit function calls to things like SET and GET, it is no longer just implicitly manipulation data using the standard language (e.g. count = count + 1). This means that persistent code looks different than “normal” code, which breaks the flow and introduces barriers to those who are just starting to learn to code.

Third, the asynchronous nature of these explicit calls not only makes the code longer, but harder to understand as well. Again, this is a barrier to developers becoming successful distributed system engineers.

Problems from distributed, replicated servers

Unfortunately, even with the persistence handled, our application is still not safe to scale out to multiple containers. To understand why this is the case, consider that there is a race between the read of a value and the subsequent write of that value. Consider what happens when two different instances of our application both read the same value, increment it by one and then both write the (same) new value back into the persistence layer. We will have serviced two user requests, but only ever incremented the count by one. One of the requests will have been lost. This is a classic example of a read-update-write race.

Of course, we can solve this problem simply by adding additional code to our simple server to define an atomic transaction connecting the read and write of the data by the server. But again, this transaction adds complexity to the code, and reduces the number of developers who can successfully build such a system. (remember the goal at the top was to broaden and democratize the set of successful application developers).

Using Metaparticle/Storage

Rather than consider what the full, atomic, multiple-container safe server looks like. Let’s consider instead what this example looks like when implemented using Metaparticle/Storage:

// Simple HTTP Server example, keeps track of the number of// requests and reports back over HTTPvar http = require('http');var mp = require('@metaparticle/storage');

mp.setStorage('redis');

server.listen(8090, (err) => {if (err) {console.log('error starting server', err)}

console.log('server is listening on http://localhost:8090');}

You can note that this code is shorter than the explicit persistence example above. It is also easier to read (and to write) due to it’s implicit rather than explicit persistence. Yet the above code both uses Redis for persistence and ensures that multiple simultaneous reads and writes to storage will not corrupt the data. So what is actually going on?

Implementation details

It turns out that Javascript has some special features. One of the coolest is the ability to create shadow or proxy objects. These proxy objects look like real Javascript objects, but they proxy all get/set operations through a pre-defined proxy method. When you use metaparticle.scoped(scope, fn) to create a new data scope, it inherently returns a proxy object that intercepts all calls to read and write. These operations form the basis of the transaction which the system will apply or roll back. The Metaparticle/Storage library observes all of these calls to set new values, and automatically persists this new data into the persistence layer. This enables implicit persistence like i = i + 1 to become intercepted and at the end of the scoped operation. Changes to the proxy object are persisted to storage. Additionally because the library controls access to storage, it is easy to detect, rollback and re-apply the same code multiple times to ensure a thread-safe concurrent variable update.

No free lunch!

As with everything, of course, there is no free lunch. In order to make your code workable, the function in the scoped block _must_ be idempotent. That is it can be called repeatedly with no additional side-effects.

Summary

Well it’s time to wrap this up, this post is getting a little long, I hop you’ve found int interesting. This is really just the beginning of the journey, if you are intrested in helping with Metaparticle/Storage, the code is out there on github today. Please come participate, file issues or otherwise contact me in the usual ways.

Best!!

Brendan