multiple nodes for message processing Kafka

Question

we have a spring boot app deployed on Kubernetes that processes messages: it reads from a Kafka topic and then it does some mappings and finally, it writes to Kafka topics

In order to achieve higher performance, we need to process the messages faster and hence we introduce multiple nodes of this spring boot app.

but I believe this would lead to a problem because:

The messages should be processed in order
the message contains a state

Is there any solution to keep the messages in order and to guarantee that a message already processed by a node wouldn't be processed by another and to resolve any other issues caused by the processing in multiple nodes.

Please feel free to address all possible solutions because we are building a POC.

does the use apache flink or spring-cloud-stream helpful for this matter?

mike mike · Accepted Answer · 2020-04-24T06:18:17

When consuming messages from Kafka it is important to keep the concept of a Consumer Group in mind. This concept ensures that nodes that read from a Kafka topic and sharing the same Consumer Group will not interfere with each other. Whatever has been read by one of the consumers within the Consumer Group will not be read again by another consumer of the same Consumer Group.

In addition, applications reading and writing to Kafka scale with the number of partitions in a Kafka topic.

It would not have any impact if you have multiple nodes consuming a topic with only one partition, as one partition can only be read from a single consumer within a Consumer Group. You will find more information in the Kafka documentation on Consumers.

When you have a topic with more than one partition, ordering might become an issue. Kafka only guarantees the order within a partition.

Here is an excerpt of the Kafka documentation describing the interaction between consumer group and partitions:

By having a notion of parallelism—the partition—within the topics, Kafka is able to provide both ordering guarantees and load balancing over a pool of consumer processes. This is achieved by assigning the partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group. By doing this we ensure that the consumer is the only reader of that partition and consumes the data in order. Since there are many partitions this still balances the load over many consumer instances. Note however that there cannot be more consumer instances in a consumer group than partitions.

multiple nodes for message processing Kafka

3 Answers