NATS JetStream — a New Way to Create Resilient Message Queue

About a year ago, I was working on a project at my previous job. The goal was to set up a reliable message queue. Most people might go straight to RabbitMQ, and I almost did too. But I decided to look for something different. That's when I came across NATS JetStream. After working with it for the past year, I can say it's truly impressive. It's fast, it can handle a lot of data, and it fits well with cloud systems.

Today, I'll walk you through how to use NATS JetStream with a Golang app. By the time you finish reading this article, you'll be able to implement a robust and fast message queue.

Message queues

For those who are not familiar, what are message queues?

In the real world, message queues help systems talk to each other without getting overwhelmed. For example, when you order something online, your order might go into a message queue before it's processed. This way, even if many people order at the same time, the system can handle it smoothly.

Message queues are key for making sure data gets where it needs to go, especially when things get busy.

Project requirements

For my project, I had a clear list of needs when it came to selecting a message queue:

Durable Queues: It was crucial that our messages stayed safe. We couldn't afford to lose them due to network glitches or if a server needed to restart. In simple terms, once a message is in the queue, it should stay there until it's processed.

Retrying Messages: Sometimes, there can be small issues or temporary errors when a system tries to read a message. I needed a way to send these messages back into the queue so they could be tried again later. This ensures that no message gets left behind because of short-term problems.

Horizontal Scaling: As our system might get more users and data, I wanted our message queue to grow easily with us.

With these requirements in mind, I began my search for the perfect message queue solution.

Possible solutions

When searching for the right message queue, I explored several options. Here's a brief overview:

Gearman, Beanstalkd, and similar: I didn't dive too deep into these. They're older and seem to lack current support.

RabbitMQ: It's favored by many small businesses. However, I found it a bit slow. Setting the HA (High-available) cluster requires real expertise. Plus, it lacks the ability to “requeue” a message with delay.

Redis: It's excellent for PubSub and storing KV data. But as a queue system? Not the best. If Redis is set to store all messages, it slows down drastically.

Apache Kafka: It's a strong choice for event streaming. But it’s not the best choice for message queues. For instance, how can I requeue a message?

Apache Pulsar: It looked promising and fits the needs of large organizations. It met most of my criteria. Yet, horizontal scaling isn't straightforward. It demands a large infrastructure and a skilled team, which felt excessive for my project's scale.

Finally, I stumbled upon NATS JetStream. It seemed to align perfectly with my requirements.

How NATS JetStream fits to requirements

To understand how NATS JetStream works, I made some simple code and did tests. Here's what I learned:

Setting Things Up: I made two simple Golang CLI tools:

One for sending messages (NATS publisher)

...
publishedTotal := 0
for i := 0; i < 30000; i++ {
	subject := "test_subject_name"
	_, err = jetStream.PublishAsync(subject, []byte(strconv.Itoa(i)))
	if err != nil {
		return fmt.Errorf("publish: %w", err)
	}

	publishedTotal++
}
fmt.Printf("published %d messages\n", publishedTotal)
...

One for fetching messages (NATS consumer)

...
totalReceived := int32(0)
for {
	func() {
		ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
		defer cancel()

		msgs, fetchErr := jetStreamSubscriber.Fetch(1, nats.Context(ctx))
		if fetchErr != nil {
			if errors.Is(fetchErr, context.DeadlineExceeded) {
				return
			}

			log.Printf("sub fetch err: %s\n", fetchErr)
			return
		}

		for _, msg := range msgs {
			ackErr := msg.Ack()
			if ackErr != nil {
				log.Printf("sub ack err: %s\n", ackErr)
			}
			totalReceived++
			log.Printf("total received: %d\n", totalReceived)
		}
	}()
}
...

Durable queues

With these tools ready, I checked a few things:

Sending Many Messages:
- Question: What if I send thousands of messages all at once?
- Answer: The consumers took all the messages in. Nothing was lost.
Sending Without Consumer:
- Question: What if I send messages but there's no one to consume them?
- Answer: When a consumer is ready, it receives all the messages.
NATS Server Issues:
- Question: How do consumers act if the NATS server stops? Do they try to connect again when NATS is back?
- Answer: Yes, consumers connect back to NATS by themselves when it's working again.
Unread Messages and NATS Problems:
- Question: What about messages sent to NATS but not consumed, and then NATS server turns off?
- Answer: If NATS starts working again, all the waiting messages are consumed. This is because NATS keeps messages safe on the computer's storage.

NATS showed excellence aligning our first requirement. Let’s check out how it can send messages back to a queue.

Requeue

In NATS JetStream, requeuing a message that has encountered a problem is straightforward. You can achieve this by calling the Nak() function available in the message API. Invoking Nak() ensures that the message is sent back to the queue, effectively providing a second chance for it to be processed at a later time.

Another feature that adds to the utility of NATS JetStream is the ability to requeue messages with a custom delay. This functionality comes in handy when dealing with tasks that are deferred and need to be carried out after a certain period has elapsed. For example, if you have a notification system that must send alerts at specific intervals, you can use this feature to schedule these messages accurately.

Two requirements are nailed. There is one more important point about scalability.

Horizontal Scaling

When it comes to horizontal scaling, NATS JetStream makes the process relatively simple. Setting up a cluster involves running multiple NATS servers with JetStream enabled. All that's required is to specify the same cluster_name and define the routing paths between each node in the configuration.

I added an example of docker-compose.yml file to run a local NATS cluster:

version: "3.5"
services:
  nats1:
    container_name: nats1
    image: nats
    entrypoint: /nats-server
    command: --server_name N1 --cluster_name JSC --js --sd /data --cluster nats://0.0.0.0:4245 --routes nats://nats2:4245,nats://nats3:4245 -p 4222
    networks:
      - nats
    ports:
      - 4222:4222

  nats2:
    container_name: nats2
    image: nats
    entrypoint: /nats-server
    command: --server_name N2 --cluster_name JSC --js --sd /data --cluster nats://0.0.0.0:4245 --routes nats://nats1:4245,nats://nats3:4245 -p 4222
    networks:
      - nats
    ports:
      - 4223:4222

  nats3:
    container_name: nats3
    image: nats
    entrypoint: /nats-server
    command: --server_name N3 --cluster_name JSC --js --sd /data --cluster nats://0.0.0.0:4245 --routes nats://nats1:4245,nats://nats2:4245 -p 4222
    networks:
      - nats
    ports:
      - 4224:4222

networks:
  nats:
    name: nats

What adds to the utility is that you can connect to any node within the cluster, whether from a publisher or a consumer application, and it just works seamlessly. This implies that the system is designed to handle varying load distributions, routing the messages effectively irrespective of which node you are connected to.

It was clear to me that NATS JetStream aligned with all my requirements perfectly.

Pros and nuances

I spent considerable time with NATS JetStream and can outline its strengths and limitations based on my experience:

Pros

Broad Functionality: NATS JetStream is more than just a message queue system. It offers a variety of features that I'll elaborate in the next articles.
Go-Based: Being written in Go, the source code is accessible and provides insight into undocumented functionalities. This is particularly useful for Go developers who want to delve deeper into the system.
Strong Benchmarks: The performance metrics for NATS JetStream are impressive, making it a reliable choice for high-throughput scenarios.

Nuances

'At-Least-Once' Delivery: One thing to be aware of is that NATS ensures 'at-least-once' message delivery. As a result, your consumer logic must be idempotent to handle such cases.
Incomplete Documentation: While the documentation is generally good, it doesn't cover all configuration options. Sometimes, you'll need to go through the SDK code to understand certain functionalities fully.
Built-In Reconnection Logic: This could be seen as both a pro and a nuance. The SDK handles reconnections automatically, which means you don't have to code this logic yourself.

Performance

The performance benchmark was conducted by Steamnative, where NATS JetStream was compared with other popular message queues like Apache Pulsar and RabbitMQ. It's important to approach these results with a degree of skepticism, as the company is known for its affinity towards Apache Pulsar. They themselves noted that they faced difficulties in configuring NATS JetStream for the benchmark, which could potentially skew the results in favor of their preferred system.

NATS JetStream – During our tests, we attempted to follow the recommended practices for deliverGroups and deliverSubjects in NATS, but encountered difficulties. Our NATS subscriptions failed to act in a shared mode and instead exhibited a fan-out behavior, resulting in a significant read amplification of 16 times. This likely significantly impacted the overall publisher performance.

Conclusion

After working with NATS JetStream for an extended period, several aspects of the system became apparent:

Easy configuration.
High performance.
Broad functionality.

Beyond its capabilities as a message queue, NATS JetStream also functions as a message broker for event-streaming, PubSub, and synchronous Request-Reply message bus.

While this article provides an initial overview, I plan to discuss the diverse features of NATS JetStream in more detail in future articles. This will include a closer look at its specific functionalities and how they can be applied in different scenarios.