22
votes

Our cluster runs Kafka 0.11 and has strict restrictions on using consumer groups. We cannot use arbitrary consumer groups so Admin has to create required consumer groups.

We run Kafka Connect HDFS Sinks to read data from topics and write to HDFS. All the topics have only one partition.

I can consider following two patterns when using Consumer Groups in Kafka HDFS Sink.

As shown in the pictures:

Case 1: Each topic has its own Consumer Group enter image description here

Case 2: All the topics have a common Consumer Group enter image description here

I am aware that when a topic has multiple partitions, and if a consumer failed, another consumer in the same consumer group take over that partition.

My question :

Does the same thing happen when multiple topics share the same consumer group? ie: if a Consumer failed(HDFS Sink), will another Consumer(HDFS Sink connector) takeover the work and read from that topic?

Update: Each Kafka HDFS Sink Connector subscribed to only one topic.

4

4 Answers

41
votes

I'm surprised that all answers with "yes" are wrong. I just tested it and having the same group.id for consumers for different topic works well and does NOT mean that they share messages, because for Kafka the key is (topic, group) rather than just (group). Here is what I did:

  1. created 2 different topics T1 and T2 with 2 partitions in each topic
  2. created 2 consumers with the same group xxx
  3. assigned consumer C1 to T1, consumer C2 to T2
  4. produced messages to T1 - only consumer C1 assigned to T1 processed them
  5. produced messages to T2 - only consumer C2 assigned to T2 processed them
  6. killed consumer C1 and repeated 4-5 steps. Only consumer C2 processed messages from T2
  7. messages from T1 were not processed

Conclusion: Consumers with the same group name subscribed to different topics will NOT consume messages from other topics, because the key is (topic, group)

1
votes

Absolutely yes. The kafka consumers should monitor both topics and then, kafka will assign the partitions (per topic) to the current active members of the consumer group.

Regardless of having one or multiple partitions on every single topic, the consumers will take charge of monitoring the partitions per topic whenever a consumer failure happens in the same group. When a failure happens, the Kafka will always trigger the re-balancing process in order to distribute the partitions to the remaining active consumers of the group and as a consequence, the work will continue running on that topics.

1
votes

yes, as long as both consumers subscribe() the the same set of topics (topicA and topicB) the partitions of all topics will be distributed across all consumers.

in your case this would mean that if one of the consumers fails, both topics will be assigned to the surviving consumer.

0
votes

The question asked is in the event of consumer fails in a consumer group, will the consumers available in the same group pick up the subscribed topics and starts processing again or not?.

But the accepted answer has the scenario where the topics are assigned to consumers, but if its auto assignment(i.e., subscribe) then the consumers that are idle in the group should pick the job of failed consumer and starts reading from the last committed offset. If its not then its breaking the consumer group parallelism architecture.

just look at this answer. Kafka consumer for multiple topic