0
votes

we have a spring boot app deployed on Kubernetes that processes messages: it reads from a Kafka topic and then it does some mappings and finally, it writes to Kafka topics

In order to achieve higher performance, we need to process the messages faster and hence we introduce multiple nodes of this spring boot app.

but I believe this would lead to a problem because:

  • The messages should be processed in order

  • the message contains a state

Is there any solution to keep the messages in order and to guarantee that a message already processed by a node wouldn't be processed by another and to resolve any other issues caused by the processing in multiple nodes.

Please feel free to address all possible solutions because we are building a POC.

does the use apache flink or spring-cloud-stream helpful for this matter?

3

3 Answers

1
votes

When consuming messages from Kafka it is important to keep the concept of a Consumer Group in mind. This concept ensures that nodes that read from a Kafka topic and sharing the same Consumer Group will not interfere with each other. Whatever has been read by one of the consumers within the Consumer Group will not be read again by another consumer of the same Consumer Group.

In addition, applications reading and writing to Kafka scale with the number of partitions in a Kafka topic.

It would not have any impact if you have multiple nodes consuming a topic with only one partition, as one partition can only be read from a single consumer within a Consumer Group. You will find more information in the Kafka documentation on Consumers.

When you have a topic with more than one partition, ordering might become an issue. Kafka only guarantees the order within a partition.

Here is an excerpt of the Kafka documentation describing the interaction between consumer group and partitions:

By having a notion of parallelism—the partition—within the topics, Kafka is able to provide both ordering guarantees and load balancing over a pool of consumer processes. This is achieved by assigning the partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group. By doing this we ensure that the consumer is the only reader of that partition and consumes the data in order. Since there are many partitions this still balances the load over many consumer instances. Note however that there cannot be more consumer instances in a consumer group than partitions.

1
votes

The limit to scaling up with Flink will be the number of partitions in your Kafka topic -- in other words, each instance of Flink's Kafka consumer will connect to and read from one or more partitions. With Flink, the ordering will be preserved unless you re-partition the data. Flink does provide exactly-once guarantees.

A quick way to experience Flink and Kafka in action together is explore Flink's operations playground. This dockerized playground is set up to let you explore rescaling, failure recovery, etc., and should make all this much more concrete.

0
votes

You can run several consumer threads in a single application or even run several applications with several consumer threads. When all consumers belongs to the same group and Kafka topic has enough partitions Kafka will do balancing among topic partitions.

Messages in one partition are always ordered but to keep an order by the message key you should set max.in.flight.requests.per.connection=1. The broker always writes messages with the same key in the same partition (unless you change the partition number), so you will have all messages with the same key ordered.

One partition is readed by the only one consumer so the only way when another consumer gets processed messages is partitions rebalance before the message has ben acknowledged. You can set ack-mode=MANUAL_IMMEDIATE and acknowledge a message immediately after processing or use other acknowledge methods.

I'd recommend to read this article https://medium.com/@felipedutratine/kafka-ordering-guarantees-99320db8f87f