Migrating to Microservices and Event-Sourcing: the Dos and Dont’s

Buzzwords like ‘Microservices’ and ‘Event-Sourcing’ are definitely not new to the software industry, as they’ve been used to address problems that are common to most of the large scale enterprise applications. Now that there’s more accumulated knowledge about them and increasing numbers of applications are being built using such patterns, it’s easier to draw some conclusions. Here, I’ll enumerate some key factors to take into account while pondering a migration of a big monolithic application into a microservices and event-sourcing architecture. I will also provide some guidelines for you to follow throughout the migration process, while making some considerations about the libs and tools I used myself — like Apache Kafka and Protobuf — to accomplish this task on some projects I’ve recently worked on.

Microservices

I’m assuming the reader already knows something about microservices architectures, nevertheless, why not quote Martin Fowler on them:

In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms (…). These services are built around business capabilities and independently deployable by fully automated deployment machinery, (…) may be written in different programming languages and use different data storage technologies.

Architects love them as they’re an effective way of separating large and tightly-coupled software monoliths into small, independent, and loosely-coupled services. These individual services should have limited scope, well-defined responsibilities and be entitled to their own data-model.

This set of characteristics enforce modularity and responsibility segregation, qualities that increase the autonomy of individual development teams, as they’ll be working on tasks with well-defined borders, decoupling dependencies and reducing overlapping with other teams’ work. This kind of autonomy allows dev teams to choose the tools that best suit their needs for each problem in hands. DevOps teams should also love this approach as microservices should be easy to deploy and run in they’re own containers.

I should also mention the benefits in scalability and fault tolerance —microservices running on isolated processes are easier to scale during peaks of usage by just launching new instances of the services; the system as a whole becomes more resilient, as the single points of failure decrease: if a specific service becomes degraded or fails, the others will be able to carry on their work independently.

Event-Sourcing

Source: Architecture documentation of wolkenkit

Event-sourcing is a whole different story. The idea here is to represent every application’s state transition in a form of an immutable event. Events are then stored in a log or journal form as they occur _(_also refereed as ‘event store’). They can also be queried and stored indefinitely, aiming to represent how the application’s state, as a whole, evolved over time.

Why put microservices and event-sourcing together?

Remember that by choosing microservices, we’re aiming to build systems able to dynamically scale and adapt to the incoming traffic, preferably scale down, meaning making a better and more efficient use of the available resources and ultimately reducing the amount of resources required by our system.

To build Reactive applications, as described in the Reactive Manifesto, that respond to stimulus instead of relying on traditional blocking request/response usecases, is an effective way of better using the available resources and of increasing the overall system’s responsiveness. Instead of blocking and waiting for computations to complete, the application will be busy using the available resources to handle user requests asynchronously, while performing the heavy tasks on separate threads that won’t block the main usecases.

It is here that events come into play: events are core for thinking about reactive systems in terms of ‘messaging’, representing an effective way for achieving asynchronous communication between microservices and replacing traditional synchronous/blocking models like JSON over REST/HTTP or other peer-to-peer specific protocols. Representing the application’s most updated state at each moment, events can be consumed by different microservices to build their local state and data-model. Microservices can in term use Pub/Sub patterns to only consume the events that are relevant for their scope.

Konrad Malawski’s report: Why Reactive? is a very good resource on how Reactive applications are a relevant way to enable scalability and responsiveness, and about distributed systems complexity.

But seriously, should you really migrate?

Until now, we have identified microservices and events as patterns that enforce modular, reactive and non-blocking applications; we’ll now discuss ways for events to be stored, queried and broadcasted to the interested services. But maybe it is time to answer the question you must be asking yourself for some time now: Should I really migrate to a microservices and event-sourcing architecture? And the answer is: Well, it depends!

A good rule of thumb to determine if an application will fully benefit from this complex architecture is to look at its target platform — does your app have requirements to run on heterogeneous client-specific platforms that you don’t have control over? Does your system require an old-fashion “next, next, next-type” installer?

If you answered Yes to the questions above, your system probably won’t benefit from the full potential of a microservices architecture as you won’t be able to monitor the platform’s health or to dynamically dimension the resources to cope with the systems’ load. On-Demand spawning of new microservice instances to respond to high traffic peaks will also be unfeasible, as you won’t be in control of the available resources and usability statistics to to do so efficiently.

Here’s another important question: does your app have high performance requirements or will be under thousands of user requests per second during large periods of time?

If this time you answered No, you probably won’t benefit much from asynchronism as you’re probably building traditional synchronous and single-user usecases.

It’s also important to notice that if you’re talking about an application with a large and old code-stack that wasn’t build around a solid and modular architecture enforcing clear separation of functional responsibilities, relying on generic APIs for internal communication, it’ll probably be more effective for you and your team to rebuild it from scratch instead of investing a lot of time breaking it apart and changing its internal flows of communication. It’s important to make a cold and impartial judgement on whether your system is worth making the move or not, as the migration process will certainly cost you a lot of time and resources.

Migrating a monolith into Microservices

The idea here is to perform an incremental migration, in order for it to be as smooth and painless as possible. Attempting to do it all at once will be a suicide mission— on one hand, no one in its perfect judgement will be happy to deal with a ‘big-bang’ integration that this kind of migration implies, on the other hand, no manager will postpone the features’ roadmap for 2 or 3 months so that the development team can turn the project’s architecture upside down.

The approach I recommend and that have worked well for me, is to avoid rewriting the existing production code, at least until an actual problem is found or a new feature is to be delivered. Try to start building new services around the existing system in a more reactive style of development, only rewriting the old internals if and when needed, not sooner! The image bellow illustrates this idea, using some facades that emulate async behavior.

Image 3 — Hide old implementations behind a new Reactive API and adding new features using the new approach, migrating old functionality following an “as-needed” strategy [1]

1 - Identify the main usecases

The first and perhaps the hardest task you’ll have in hands will be to define the services’s borders. Remember: a microservice should have a well-defined responsibility that should map to a more or less simple usecase.

As you should know by now, services should be as decoupled and autonomous from its peers as possible and have their own persistence modules. This approach allows for a more effective handling of the data-model as each service will only manage and store the state it really needs, and in a format well suited for its usecases — more points for modularity and separation of responsibility — .

2 - Decouple internal components and re-define their data-model

Now that you have clearly identified the main usecases and defined the microservices’ borders, use this design phase to look at your application and in turn identify how its current software layers can map with the newly defined services. Don’t also expect this to be an easy task, as usually monolithic applications aren’t usecase-oriented: most of your usecases will cross several abstraction layers and perform several data transformations.

In a microservices approach, we aim to achieve top-down isolation, with each usecase requiring as minimum interaction with other services as possible and also as minimum data transformations as possible. To achieve high response throughput it’s crucial to use an optimized data-model for each specific task. For instance, maybe it’ll be helpful to have a JSON-backed database ready, like Elasticsearch, for a service that only produces JSON responses, or maybe consider a Graph-based database like Neo4j, for a service that only performs routing calculations.

3 - Design clean and generic APIs

Now it’s time to normalize the inter-service communication protocols by designing clean and evolutive APIs. Remember the events? Now you should implement them and choose a technology. During this design phase, you’ll be encoding all your application’s state transitions into small, entity-oriented events. Those events will flow within the system and be consumed by several different services in order to build local state, therefore it is crucial to choose a solid technology for the event definition, with proven fast processing and good encoding/decoding performance results.

I’ve been having a very good experience using Google’s Protobuf, however there are several other similar technologies fit for the task like Thrift or Avro. There are several comparisons out there[2][3], showing that Protobuf and Thrift, sharing similar results, perform way better than plain JSON or XML in terms of file sizes and serialization/deserialization times. Avro performs well for large objects but not quite for our small events.

Table 1 - Small objects serialization time in ms [2]

Protobuf is a great protocol overall: it is used by Google on their core applications, it’s very well documented and widely adopted by the industry. It has also proven being great at allowing the APIs’ evolution by supporting optional fields. This enables different versions of the same API to be used by different services, as long as one follows the approach of adding new fields as ‘non-required’, letting the application layer to implement default logic for when such fields are absent.

4 - Add an event broker

If events are to be treated as first class citizens on your application, you better find an effective way to make good use of them — the event broker you choose will be crucial to the overall performance as it will be the central component enabling different microservices to communicate among themselves.

I have good things to say about Apache Kafka — it has proven to be a reliable and efficient platform for storing and dispatching thousands of events per second. You can find more details about Kafka’s replication and fault tolerance models here.

Kafka can both be used as a message broker and a persistent storage of events, as it can store them on disk indefinitely, while they’re always available to be consumed any time (but not removed) from the Topics they were delivered to.

Image 4 — Journaling: A Kafka Topic storing timely ordered events

Your Topics will be immutable and ordered sequences of events that will increasingly grow in a structured journal form. The events are then assigned offsets that univocally identify them within the Topic — Kafka can manage the offsets itself, easily providing “at most once” or “at least once” delivery semantics, but they can also be negotiated when an event consumer joins a Topic, allowing microservices to start consuming events from any arbitrary place in time — usually from where the consumer left off. If the last consumed event offset is transactionally persisted in the services’s local storage when the usecases ‘successfully complete’, that offset can easily be used to achieve an “exactly once” event delivery semantics.

5 - Carefully design the event Topics

Image 5 — A possible (and very simplified) microservices + event-sourcing architecture for a Bitcoin Exchange. Highlights for entity-oriented topics and use-case oriented microservices

Topics are an important abstraction here and the way you model them will make all the difference: they’re like repositories for events, and a user can define as many as he wishes. A strategy that has proved to be effective for me, is to use a different topic for each entity type of event, building them entity-oriented instead of use-case oriented. This way, I always know that a certain Topic only represents the sate transitions regarding a specific data-model entity instead of, for instance, the usecases it has been a part of. This approach makes the event consuming substantially easier, allowing the services to only consume the state transitions they’re interested in and this is usually the only common information that is relevant for different microservices.

This concept is illustrated in Image 5 — Topics store first class entities, modulated as events like L_ogins, Orders or Fills;_ microservices handle simple usecases like placing orders, displaying the Order Book, displaying the Price Chart, etc — . We can also see that stateless services don’t share their database at all: Trading History service uses a more JSON oriented persistence model, as for User Balance service a relational DB is enough. One should also notice that different stateful services like Price Chart or Order Book build their local state uniquely by consuming events from the Fills Topic and nowhere else.

For more complex usecases where the communication among different services is indeed necessary, the responsibility of finishing the usecase must be well recognized — the usecase is decentralized and only finishes when all the services involved acknowledge their task as successfully completed, otherwise the whole usecase must fail and corrective measures must be triggered to rollback any invalid local state. This is actually a very common pattern called “Saga” used in microservices architectures.

6 - Glue the pieces together

Now that you have been through all the proposed (re)design phases, it’s time to deploy the new infrastructure and implement new simple functionality. Remember, one service at a time, give yourself room to experiment with the message broker and collect performance statistics on the new reactive usecases. Don’t forget to proper measure your performance and response times, as sometimes some unexpected latency overhead can be introduced before the proper platform fine-tuning.

Final thoughts

Have in mind that a microservices application is a distributed system and distributed systems are intrinsically complex. That being said, don’t expect a migration to microservices and event-sourcing to be an easy task and also don’t expect to get it right after the first attempt. You’ll certainly make mistakes, but I believe they can be greatly reduced if you follow the main guidelines I describe here. By performing an incremental migration; giving room for future improvements; by carefully identifying the main usecases and mapping them to individual services with well-defined borders and responsibility; by pragmatically upgrading the current system’s data-model into an event-driven design backed by generic APIs and, finally, by enforcing reactive behavior, you’ll have a clear plan to guide you throughout the migration process and that will give you confidence not only to proceed, but also to proceed without fearing future requirements for you application.

____Give me feedback: clap if this is a good read ;)

References: [1] — Konrad Malawski, 2017, “Why Reactive?”[2] — Data Serialization Comparison, 2017, http://labs.criteo.com/2017/05/serialization/[3] — jvm-serializers, 2017, https://github.com/eishay/jvm-serializers/wiki