A Non-Standard Way to Use Apache Kafka Message Broker

Sometimes developers have to use technologies in their work for other purposes. There could be several reasons for this: starting with a limited budget, lack of experience of developers, or because of the limitation of the company's infrastructure, or even a direct ban on the company's management. For this case, below you can see one non-standard way to use the Apache Kafka message broker.

The work of many services needs to be configured to control the logic of the application. And it's not about configuring server ports or database credentials - it's meta-information used in system management aka "business configuration". One of the most common scenarios for such a configuration is the state of the feature toggle, variable coefficients for business formulas, AB-test branching, and so on. In other words, something that must remain unchanged between releases, but, if necessary, be modified at any given time.

If a service is required to have high availability and low latency, then this configuration cannot be something that is retrieved from a database on every request. If a system involves a large number of microservices and requires horizontal scaling, then to manage the business configuration usually is used a separate dedicated microservice. Such a service could be an admin panel, with authentication and authorization.

System Features

The solution described below is created on the cluster of dozens of high-throughput servers that routed messages from http to kafka, or from kafka to kafka. The routing logic changed without restarting the servers, thus the routing rules can be considered as a business-configuration. Rule contained a SQL-like predicate, which mapped against the payload of the incomming message. So rules should be updated online.

Configuration Management Service

Editing rules is implemented in a dedicated server - the administrator's panel, access to which was regulated by roles and granted only to of users. The best choice for the admin panel is a SQL database that gives quick access and ACID transactions - perfect for creating master data.

Often, the Shared Database approach is used for admin panels, but in our case this was excluded due to security reasons. The router services were located on different subnets and there were physical security gateways between internal network, which made it difficult to use solutions like Hazelcast, Redis, Consul, and so on, and the entire available arsenal was limited to Postgres, Kafka, and HTTP-requests.

Why choose Kafka?

The first implementation was based on the REST API. The router-service at startup came for the configuration to the admin-panel server and started working after it was received. Further, the service requested updates at a certain frequency.

After the appearance of external consumers, for which http access to our network was closed. Therefore we came up with an idea to use Kafka topics for distributing configs, as this is one of the simplest and most reliable solutions available.

Each service read the entire topic from the beginning during startup, thus receiving all updates and applying them to the system. Kafka guarantees FIFO within 1 partition, so changes can be applied incrementally - in the same order as it’s done on the master data, and also Kafka allows you to set up message retention so we can keep messages forever.

Since there was a problem of having security gateways between networks, it was decided to bypass it through replicas as follows:

Kafka available for writing lies in the internal network along with the admin panel;
Other networks have replicas that are read-only, as routers read topics from the very beginning we don’t even store the offsets;
Standard Kafka mechanisms are used for replication.

Eventual consistency was an acceptable trade-off there. The backlog of configuration consumers is not critical, it is enough to choose the message scheme so that they are not connected to each other and have a consistent state

Features of implementing config distribution via Kafka

Message Format

Kafka has some limitations with regards to the size of the message, so it was decided to write each updated row as a new message.
You can't delete messages from Kafka, but you can send a tombstone message, but that's not appropriate in the current situation. Consumers store the state of the configs and we need them to react to the deletion of the entity, so we add the deleted flag to the message.
Some settings may change frequently, so we decided to use the log compacted topics. These topics contain only the most recent messages, and older duplicates are removed. The rule ID is used as the message key for Kafka, and it also ensures that updates get into one partition.

Consistency of data between Postgres and Kafka

It is impossible to make a guaranteed record in two heterogeneous sources, in the database there are own transactions, in kafka there are their own. There are a few common solutions:

implement transactional outbox;
use Kafka’s own transactions;
using a CDC solution - eg debezium.

Transactional outbox is a common pattern consisting of transactional creation of tasks as a record in an additional SQL table, which is an outbox and subsequent asynchronous shoveling of these tasks.

Kafka transactions allow you to record that is not available to clients with certain settings. This will continue until a signal to complete the transaction is sent. It is also possible to synchronize a write and a commit with a relational database transaction, but this is still not a guaranteed way to synchronize since the commits are done one by one.

CDC (ChangeDataCapture) - solution that captures updates from the database logs, publishing everything to an outer consumer, which could be Kafka. But there is one drawback to this solution - it is cumbersome, as well as the need to use a separate component and introduce an additional point of failure.

A solution has been created that is similar to a transaction outbox and guarantees the sequence of sending. Each rule in database has a status sent. So after each update in the same transaction it marked as false, after it we sent message to kafka and then update the status to true. If something goes wrong on sending we retry the sending in a failover job. To avoid concurrency issues we also added a version to the row, so clients can ignore outdated versions.

Conclusion

Such a solution could be criticized by other developers, especially if they joined the team after you’d left it. But there's nothing wrong with getting things to work at the right time, with the resources that are available.