0
votes

The apache kafka documentation mentions the following :

If all the consumer instances have the same consumer group, then the records will effectively be load balanced over the consumer instances.

If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.

this makes things a bit unclear for me when thinking about partitions, does that second statement mean that if i have multiple consumer groups, does that mean that each consumer in each group will read all the records in all partitions ?!!

Still the photo they used in the documentation does not agree with the above as per my humble understanding.

Multiple consumer groups setup

In fact i was reading through a great article, kafka in a nutshell and the quoted statements below conform much better with the photo provided in the documentation.

Consumers can also be organized into consumer groups for a given topic — each consumer within the group reads from a unique partition and the group as a whole consumes all messages from the entire topic. If you have more consumers than partitions then some consumers will be idle because they have no partitions to read from. If you have more partitions than consumers then consumers will receive messages from multiple partitions. If you have equal numbers of consumers and partitions, each consumer reads messages in order from exactly one partition.

I was hoping someone could shed some light on the above and explain clearly a scenario based on Apache's official documentation.

1

1 Answers

1
votes

does that mean that each consumer in each group will read all the records in all partitions ?!!

No. The statement assumes that each group has exactly one consumer (as indicated by "If all the consumer instances have different consumer groups").

So your overall understanding is correct. If you have multiple consumer groups a message will be sent to each group once.