6
votes

I was reading this SO answer and many such blogs.

What I know:

Multiple consumers can run on a single partition when running multiple consumers with multiple consumer group id and only one consumer from a consumer group can consume at a given time from a partition.

My question is related to multiple consumers from multiple consumer groups consuming from the same topic:

  1. What happens in the case of multiple consumers(different groups) consuming a single topic(eventually the same partition)?

  2. Do they get the same data?

  3. How offset is managed? Is it separate for each consumer?

  4. (Might be opinion based) How do you or generally recommended way is to handle overlapping data across two consumers of a separate group operating on a single partition?

Edit: "overlapping data": means two consumers of separate consumer groups operating on the same partition getting the same data.

2

2 Answers

8
votes
  1. Yes they get the same data. Kafka only stores one copy of the data in the topic partitions' commit log. If consumers are not in the same group then they can each get the same data using fetch requests from the clients' consumer library. The assignment of which partitions each group member will get is managed by the lead consumer of each group. The entire process in detailed steps is documented here https://community.hortonworks.com/articles/72378/understanding-kafka-consumer-partition-assignment.html

  2. Offsets are "managed" by the consumers, but "stored" in a special __consumer_offsets topic on the Kafka brokers.

  3. Offsets are stored for each (consumer group, topic, partition) tuple. This combination is also used as the key when publishing offsets to the __consumer_offsets topic so that log compaction can delete old unneeded offset commit messages and so that all offsets for the same (consumer group, topic, partition) tuple are stored in the same partition of the __consumer_offsets topic (which defaults to 50 partitions)

1
votes
  1. Each consumer group gets every message from a subscribed topic.
  2. Yes
  3. Offset are stored by partition. For example let's say you have a topic with 2 partitions and a consumer group named cg made up of 2 consumers. In that case Kafka assigns each of the consumers one of the partitions. Then the consumers fetch the offset for the partition they were assigned to from Kafka (e.g. consumer 'asks' Kafka: "What is the offset for this topic for consumer group cg partition 1", or partition 2 for the other consumer). After getting the correct offset the consumer polls some Kafka broker for the next message in that partition.

I'm not entirely sure what you mean by overlapping data, can you clarify a bit or give an example?