1
votes

I was reading multiple articles for Kafka to understand about consumer group. I have one doubt, how does Kafka ensure a message will be processed only once by one consumer in a consumer group?

Consider there are more than one consumers in consumer group. Do Kafka maintain some kind of track of each message and try in sequence manner in each consumer one by one?

Any reference or help will be appreciated.

2

2 Answers

2
votes

Firstly, Kafka consumer group helps us when your topic has more than 1 partition.

Consider below scenarios:-

No. Of Partitions - 3, Consumers - 3

Kafka assigns one partition to one consumer. Unless some consumer failed and Consumer Rebalancing occurs(re-assigning partitions to consumers), all consumers are mapped to their partitions and consume events in sequence manner with respect to those partitions.

No. Of Partitions - 1, Consumers - 3

If there are more consumers than number of partitions, Kafka do not have enough partitions to assign the consumers. So, one consumer of the group gets assigned to the partitions and rest of consumers of the group would be sitting idle.

No.Of Partitions - 4, Consumers - 3

In this scenario, one of the consumer gets 2 partitions and during Consumer rebalancing another consumer might get 2 partitions.

To your question on whether Kafka maintains some kind of track to maintain the sequence ? yes - At partition level - It maintains commit offset in each partition and consumes in sequence.

No - At Topic level (unless you have single partition).

** @mike explained above on how the sequence is maintained at partition level using commit offset.

1
votes

The consumer can commit a message that it read from a topic in order to avoid reading it again.

This can be achieved basically through two different approaches:

  • enable enable.auto.commit: "If true the consumer's offset will be periodically committed in the background." This is enabled by default and you can use the consumer properties auto.commit.interval.ms to change the time when the commit should happen. The default value for the interval is set to 5 seconds. All details on the consumer configs are given in the Kafka documentation
  • call consumer.commitSync() (or commitAsync()) in your code after polling the data.

As you have the relation that one particular partition can only be consumed by at most one consumer out of a consumer group, the commit works on the basis of consumerGroup, partition and offset.

The JavaDocs on the KafkaConsumer class is actually pretty nice and gives you all the details and examples for the "Automatic Offset Committing" and "Manual Offset Control"

Note: you phrased "how does Kafka ensure a message will be processed only once..."

I am not sure if you talking about "Exactly once delivery semantics" here but keep in mind that without any additional effort the above approaches can still cause a consumer group to consume a message twice. Imagine this scenario:

  • You enable auto commit with a time interval of 5 seconds
  • Your KafkaConsumer polls the data and you are about to process it
  • After 2 seconds your processing caused an Exception and your job failed. That means the auto commit of that one message did not happen.
  • Now, restarting your job will cause the consumer to read the same message again, because it was not committed yet.