Lessons Learned Building Distributed Systems with CQRS and Event Sourcing

Several years ago, I had an idea, or more of a drive really. Maybe even an obsession. I wanted to build the holy grail of efficiency in development. It didn’t have to be for everyone. I wanted it for me.

I wanted to build highly scalable fault tolerant systems made up of simple services that were maintainable, malleable, withstood the test of time, and importantly — were easy for humans to understand.

I wanted my services to be easily testable, and I didn’t want have to think about how they ran in production.

I wanted myself and my teammates to be able to deliver high quality code quickly. When code is delivered more quickly, that means your business can experiment more quickly, and therefore, find winning ideas more quickly.

This is basically the entire point of the book “The Lean Startup”.

Build. Measure. Learn. Repeat. And so it continues, from the startup into the enterprise.

To win, you need to build, measure, learn, and repeat and repeat and repeat.

It gives our businesses competitive advantages.

And hopefully, that means everyone makes more money, which means we (or maybe me) can focus on what we really want: enjoying life, rock climbing, traveling, writing, spending time with my beautiful girlfriend, and let’s be honest, probably programming some more... or too much.

However, we cannot just be faster. We cannot sacrifice on quality.

Our code powers basically all the things, and all the things can’t be breaking all the time! Especially when I’m at the beach!

Turns out, such things are easier said than done. It’s not easy to go from reading Domain Driven Design and knowing what the hell that actually means in practice.

I’ve heard others joke that the time spent in understanding and being able to implement DDD’s techniques is equivalent to getting a PhD.

I first read DDD in 2007 when I was a sophomore in college. I wouldn’t even say I am a DDD expert now — however there are some REALLY useful concepts from DDD, and, languages of today are much more powerful and expressive than they were in 2003 when the book and its examples were written.

It’s not until now, several years later, that I’ve figured out how to do it well. And more importantly, have ran a few production systems using these techniques and formalized my views and approaches on the matter so I could share what I’ve learned with others!

Here’s my view: It is complicated, but mostly because of the signal to noise ratio.

There’s a whole lot of noise. Modeling problems aside, that still leaves literally hundreds of libraries and approaches to building microservices because it basically just means “a really small and focused service”.

I want to be the strong signal in that noise that you can follow to success.

Additionally, just having a bunch of services means a whole bunch of new headaches in general!

More services to test!
More integrations to test!
More deployments to deploy!
More databases to provision!
More caches to invalidate!
More everything!

The act of making simple services makes your architecture and operations more complex.

It just does.

It’s how it works.

However…

That doesn’t mean it needs to be complicated.

Quick analogy: Anyone remember CSS Sliding Doors Technique?

That’s back when CSS was awful and it took about 100 lines of code to make rounded corners.

Now, it’s one line of code.

Things tend to get easier over the years.

And yet, it’s still hard to find a consensus on how to build microservices!

But don’t fret, I’m here to direct you to three key areas that will help you tackle the multi-headed beast.

1. Learn more design patterns!

The next thing in software engineering is always standing on the shoulders of its predecessors. Without the minds and thoughts of thousands of engineers who came before the next set of thoughts and patterns would not be possible.

If you don’t love patterns already, well, I’m surprised you’re an engineer! If you come across a problem chances are someone has already solved it, or at least some variety of it.

Design Patterns are essential to building highly scalable, and fault tolerant systems that are human friendly!

In my every day work, I make use of all sorts of patterns all the time! Here’s several awesome patterns that come to mind: Event Sourcing, Repository Pattern, Singleton, Factory, CQRS, Circuit Breakers, POJOs, and more!

Beyond classic patterns, I’ve also come across some “microservice” patterns over my years that help to think about designing large systems for enterprises from a higher-level view.

There is no better resource that the classic “Blue Book” to get you started with the concepts. So popular it can go by a color and other engineers will know what you are talking about: Domain-Driven Design: Tackling Complexity in the Heart of Software: Eric Evans.

As far as microservice specific patterns, here are some of the most common ones I use: These Five Microservice Patterns Will Make You a Better Engineer.

Moving on…

2. Objects are important, but you know what’s also really important? Events and Commands.

Commands are how things happen. Events are what has happened.

Commands and Events are both messages.

Entities are what events happened to. Aggregates are collections of related entities.

It turns out that all of these things are really, really, important.

Oftentimes, with the focus on OO principles, you’ll hear people talking about Entities, and maybe Aggregates, but the Events and Commands are lost! This is even worse when you are just updating a database with the new state. All of the history of the world you’ve modeled are lost with every UPDATE.

I find it very sad. 😢

Before this was understood, ORMs were popular, which has been referred to as the “Vietnam of Computer Science” by some. We didn’t win the war with ORMs.

The Vietnam of Computer Science · Ted Neward’s Blog

Object-Relational Mapping is the Vietnam of Computer Science · Coding Horror

Many proponents of Domain Driven Design evolved their thinking over the years to move away from the focus on the Nouns, and began to usher in a new era of Events. Check out the works of Greg Young, Udi Dahan, and Rinat Abdullin. Even the Google Group “DDD” eventually was renamed to CQRS/ES+AR (Command Query Responsibility Segregation with Event Sourcing on Aggregate Roots)!

Events are the language of distributed systems… And life really.

When modeling the world, you need to model the Events and Commands of the world as well. Events and commands work really well as the language of a distributed system.

When I visualize systems I like imagine paper forms being filled out and passed between human actors and that’s essentially a distributed system, and an easy way to think about eventual consistency.

inventory.product.catalog

yields

inventory.product.cataloged

bus.on(‘inventory.product.cataloged’, reactToTheFactThatThisEventHappened)`

Expanding on that, I also want to introduce a very simple mathematical equation:

state = leftFold([...previousEvents])

The state is the left fold of the previous events.

For those of you who speak JavaScript:

const eventsourcing = (events, snapshot = {}) =>   
  events
     .reduce((state, event) =>
        Object.assign({}, state, event.payload), snapshot)

The events ARE a normalized, immutable source of truth for your domain.

The current state therefore is derived by applying the events on top of each other sequentially.

If you know everything that has happened in your subset of the world — your bounded context — then you can determine the state of that world.

Here’s a simple really contrived example — imagine you are building a robot which picks up and places items onto the surface of a table.

The table could be your aggregate.

Are you using a table right now? On it, is probably a laptop, or maybe a TV remote.

The context in this case is the problem at hand. That we want to know what objects are on a table and what surfaces are available for new items. Our model only needs to contain information relevant to that task.

You can imagine that if you were writing software for a warehouse, your idea of what a table is and what information you would care about might be very different.

The context is important. In DDD, this is what Evans refers to as a bounded context.

Anyway, on with the example…

Let’s command the robot to place a ball on the table.

To do this, I use a library called “servicebus”. Servicebus is really cool because it allows you to use middleware for events so you can easily add things like retry or deduping logic backed by Redis, or tracing with very little effort. It was originally built on RabbitMQ, but I’ve been working on a Kafka version as well which supports the original plugins.

bus.send('table.item.place', { 
  type: 'ball',
  properties: { color: 'red' },
  position: { top: 1, left: 1, unit: 'inch' }
}

When it happens, the robot can confidently declare “I placed the ball on the table! It’s positioned 1 inch from the top, and 1 inch from the left!”

It happened.

It can’t unhappen.

It’s an immutable fact. The ball was placed on the table. Period.

Let’s let the rest of the world know, so they can respond to the event if they are subscribed.

bus.publish('table.item.placed', { item })

Now, I want to stress that it is an immutable fact that this event occurred.

The newspaper has already been published and sent out the door!

If you want to undo it, your only option is another command — table.item.remove

Which would lead to the event table.item.removed to be published when successfully completed.

If you are a third party that cannot see the table, but you were subscribed the events that immutably have occurred about the table, you could determine the current state of the table.

A ball was placed on the table, and then removed. The current state is an empty table.

This sort of event based architecture is known as an “Eventually Consistent” system.

The third party does not know instantly as soon as the ball is placed on the table, however, it receives a message stating that the event occurred. Once the message has been received, the receiver can determine the new state of the table.

It’s a very popular pattern as well. Largely because of CAP theorem which stands for Consistency, Availability, and Partition Tolerance. The rule is you can only pick two. Different parts of the system can optimize for different goals, and the system at large generally sacrifices on consistency.

For example, it’s ok if you didn’t know Billy posted that rad new Instagram until 23 seconds later. You eventually get it.

Although I’ve been building services like this for years, it’s recently making another round the trends as “event-driven architectures”.

It’s also one step away from CQRS. All you need is to subscribe to a few event streams and create a projection of the data that is suited for the application at hand. The process is called denormalization, and hence I call services that do this “denormalizers”.

Which reminds me of a funny story: One time somebody on the slack where I hang out, somebody was asking about CQRS and I accidentally wrote “demoralizer”. He was like “is there seriously thing called a demoralizer”. 🤣

Do yourself a favor and read this still relevant article from 2012: The Log: What every software engineer should know about real-time data’s unifying abstraction | LinkedIn Engineering by Jay Kreps co-creator of Kafka.

3. Automate your operations and infrastructure

DevOps — the intersection of Development and Operations — is in a renaissance.

As I mentioned earlier, with the simplicity of services, some complexity necessarily moves into your operations and architecture.

I knew I wanted to build highly scalable fault tolerant systems made up of simple services that were maintainable, long-lived, malleable, and easy for humans to understand.

I knew microservices and Domain Driven Design patterns would allow me to deliver on all of those goals as well as allowing myself and my team to consistently deliver high quality code that could stand the test of time.

However, just running them in production turned out to be a pretty monumental task.

I was new to DevOps and didn’t even know where to start.

After googling, I came across a ton of AWS courses, there were five different levels of certifications, and each took months of time and hundreds of dollars in lessons.

This was no small order. I knew the de-facto standard was to become an AWS Solution Architect… problem was, that was the 5th level of certification for AWS, and until this point, I basically only deployed to PaaS providers like Heroku and Modulus, or someone else had handled DevOps.

So finally after spending years figuring out enough of the whole DDD, and CQRS/ES+AR thing, I still had to become an expert in another entirely different subject area just to be able to do it effectively.

Remember those CSS sliding doors I talked about? The ones that were about 100 lines of code to make a tab in HTML and CSS with rounded corners.

Luckily, things get simpler over time.

It’s easier than ever to get dangerous with DevOps. Read about My Journey to DevOps Bliss, Without Useless AWS Certifications and make sure to grab your free three week email course at the end!

Conclusion

That’s all for today! Thanks for reading. If you have any questions, or if you’ve found this helpful I’d love to hear your thoughts in the comments.

The best way to help me reach others is sharing on social media!

Best,
Patrick Lee Scott

P.S. Jørn André Myrland asked a great question in the comments — make sure you check out my answer for a better higher level picture. (https://medium.com/@patrickleet/glad-you-asked-1c5229ee3af6)

Want to learn more about me and my story? Click here to read about me and how my agency, Unbounded, can help you build distributed systems.