I am trying to have multiple consumers for multiple partitions of Kafka topic with same groupId which will help me scale the consumption of messages.
According to Kafka documentation, it says:
If all the consumer instances have the same consumer group, then the records will effectively be load-balanced over the consumer instances.
Having consumers as part of the same consumer group means providing the“competing consumers” pattern with whom the messages from topic partitions are spread across the members of the group. Each consumer receives messages from one or more partitions (“automatically” assigned to it) and the same messages won’t be received by the other consumers (assigned to different partitions). In this way, we can scale the number of consumers up to the number of the partitions (having one consumer reading only one partition);
But when I deploy multiple spark application with same groupId it gives me the following exception:
java.lang.IllegalStateException: Previously tracked partitions [cpq.cluster-1] been revoked by Kafka because of consumer rebalance. This is mostly due to another stream with same group id joined, please check if there're different streaming application misconfigure to use same group id. Fundamentally different stream should use different group id
at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.latestOffsets(DirectKafkaInputDStream.scala:200)
at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:228)
According to exception, I cannot have multiple consumers with the same groupId. Hence I am unable to have load balancing in my spark application; I can only assign 1 consumer per topic partition and this contradicts the Kafka documentation.
How can I have multiple consumers with the same consumer groupId to have load balancing?