0
votes

This is with reference to SimpleConsumer Example and High Level Consumer Example.

As per the documentation, it seems to suggest that SimpleConsumers are responsible for managing the offsets themselves and they can choose to read a message multiple times or consume only a subset of the partitions in a topic. All this is possible because they can form their request and specify what offset they want.

Now, if I have two clusters of simple consumers and both use a different zookeeper to store the offsets, then it is very likely that both the clusters will read duplicate messages. Is that understanding correct? To void duplication among them, they have to use a single zookeeper-cluster to store the offsets.

The concept of consumer-group applies only to the High-Level consumer. So if I have two clusters of high-level consumers and both use the same group-ID, then then they will not get any duplicate messages.

Please suggest if the above is not correct.

2
You might want to read this: stackoverflow.com/documentation/apache-kafka/5449/… (it's for 0.9+, but the basics are the same)Matthias J. Sax

2 Answers

0
votes

Simple consumer don't use zookeeper to store the offsets. It's recommended not to use Zookeeper as a store for saving the processed record offsets.

The concept of consumer-group applies only to the High-Level consumer. So if I have two clusters of high-level consumers and both use the same group-ID, then then they will not get any duplicate messages

What do you mean by two clusters? If both the consumers belongs to the same group (having the same group-ID), then your statement is correct.

0
votes

If you are using High-level consumers and same group-id, then there will be no duplication of messages while consuming from the same topic.

If using simple-consumers, it completely depends on how you are maintaing the offsets. If both the consumers have their offsets in sync i.e. they maintain the same offset level, then there won't be any duplication. In your case, it may cause duplication since you are maintaining the offsets separately.