Difference between Kafka and RabbitMQ

Hello Friends, From last few days i was looking into the queue system where i can identify the which is better and when to use , where to use which type of system, what are their capability.

I look for the Kafka and RabbitMQ, try to find out some comparison between them.

RabbitMQ in a nutshell

Who are the players:
1. Consumer
2. Publisher
3. Exchange
4. Route

The flow starts from the Publisher, which send a message to exchange, Exchange is a middleware layer that knows to route the message to the queue, consumers can define which queue they are consuming from (by defining binding), RabbitMQ pushes the message to the consumer, and once consumed and acknowledgment has arrived, message is removed from the queue.
Any piece in this system can be scaled out: producer, consumer, and also the RabbitMQ itself can be clustered, and highly available.

Kafka

Who are the players
1. Consumer / Consumer groups
2. Producer
3. Kafka source connect
4. Kafka sink connect
5. Topic and topic partition
6. Kafka stream
7. Broker
8. Zookeeper

Kafka is a robust system and has several members in the game. but once you understand well the flow, this becomes easy to manage and to work with.

Consumer send a message record to a topic, a topic is a category or feed name to which records are published, it can be partitioned, to get better performance, consumers subscribed to a topic and start to pull messages from it, when a topic is partitioned, then each partition get its own consumer instance, we called all instances of same consumer a consumer group.

In Kafka messages are always remaining in the topic, also if they were consumed (limit time is defined by retention policy)

Also, Kafka uses sequential disk I/O, this approach boosts the performance of Kafka and makes it a leader option in queues implementation, and a safe choice for big data use cases.

Let’s compare:

1. Distribution and parallelism

Both give a good distribution solution, but with some difference.
Let’s talk about consumers, in RabbitMQ, you can scale out the number of consumers, this means, for each queue instance you will have many consumers, this called competitive consumers because they compete to consume the message, in this form the message processing work is spread by all the active consumers, but still message can be proceed only once.
In Kafka, the way to distribute consumers is by topic partitions, and each consumer from the group is dedicated to one partition.
You can use the partition mechanism to send each partition different set of messages by business key, for example, by user id, location etc.

2. High Availability

Both solutions are highly available, but Kafka took that a step further, by using Zookeeper to manage the state of the cluster, and this leader is also highly available and can be distributed, think on it like they have a guard over a guard.

3. Performance

Kafka leverages the strength of sequential disk I/O and requires less hardware, this can lead to high throughput: several millions of messages in a second, with just a tiny number of nodes.
RabbitMQ also can process a million messages in a second but requires 30+ nodes.

4. Replication

Kafka has replicated the broker by design, and if the master broker is down, automatically all the work is passed to another one which has a full replica of the died one, no message lost.
In RabbitMQ queues aren’t automatically replicable, this need to be configured.

5. Multi subscriber

In Kafka message can be subscribed by multi consumers, means, many consumer types not many instances of same one.
In RabbitMQ message can be consumed only once, and when consumed, the message disappears and isn’t accessible anymore.

6. Message ordering

Because Kafka has partitions, you can get messages ordering.
messages are routed to topics by message key, so, when choosing a correct key, you will get one topic for any key, with ordered messages.
This can’t be achieved in RabbitMQ, only by trying by mimic this behavior by defining many queues and sending each message to a different queue, at scale, this can be hard to get.
compaction log: if same message key has arrived multiple times, then Kafka saves only the last value in the log, and delete old messages.

7. Message protocols

RabbitMQ supports any standard queue protocols like AMQP, STOMP (Text based), MQTT (lightweight publish/subscribe messaging) and HTTP, while Kafka supports primitives (int8, int16, int32, int64, string, arrays) and binary messages.

8. Message lifetime

Because Kafka is a log, messages are always there, you can control this by defining a message retention policy.
RabbitMQ is a queue, messages removed once consumed and acknowledgment arrived.

In RabbitMQ you can configure messages to be persistent, mark the queue as durable and messages as persistent, From the docs: The persistence guarantees aren’t strong

9. Message acknowledgment

In both frameworks, producer get confirmation that message arrives in queue/topic and also the consumer sends an acknowledgment when message consumed successfully. so you can be sure that messages didn’t get lost in the way.

10. Flexible routing to a topic/queue

In Kafka message is sent to topic by key, in RabbitMQ there are more options, for example by regular expression and wildcard, check the docs for more information.

11. Message priority

In RabbitMQ, you can define message priorities, and consumed message with high priority first. for more information look in https://www.rabbitmq.com/priority.html
hard to achieve in Kafka (can be done by message keys, but in large scale, this can be hard)

12. Monitoring

In Kafka you have 3rd party tools:

License for a production environment
Confluent https://www.confluent.io/product/control-center/
Landoop http://www.landoop.com/
Burrow https://github.com/linkedin/Burrow
Kafka Tool http://www.kafkatool.com/

In RabbitMQ you have a built-in management UI (default <host_name>:15672).

13. Transaction support

Both support atomic writes, means if you write a bunch of messages to queue and one failed, all the transaction is rollbacked, this extremely used in Kafka stream processing.

Use Kafka if you need

Time travel/durable/commit log
Many consumers for the same message
High throughput
Stream processing
Replicability
High availability
Message order

Use RabbitMq if you need:

flexible routing , Priority Queue, A standard protocol message queue

Conclusion:

Actually, RabbitMQ is enough for simple use cases, with low traffic of data, you have certain benefits like a priority queue and flexible routing options. But for massive data and high throughput use Kafka without debates.

If you need a commit log or multiple consumers for the same messages, then go to Kafka because RabbitMQ can’t assist you with it.

23
Shares

Difference between Kafka and RabbitMQ

Published by Tarun

One Reply to “Difference between Kafka and RabbitMQ”