Question about multiple spark streaming jobs consume one kafka topic with same group id

Question

I submitted multiple spark streaming jobs that consume one same kafka topic with same "group.id", as said in pure Kafka documentation, mutiple consumers with same "group.id" will join in the same consumer group, and records in partions in the kafka topic will be splitted into these consumers. However, as I tested in my job, these two spark streaming jobs still consume all partitions of the topic samely (not evenly splitted), and there is no repartition/exception happened during the who process. Does anyone here have some knowledge of how Spark manages kafka partition offsets differently than pure kafka platform? maybe this is caused by zookeeper manages kafka offsets in spark vs kafka itself manages it?

Rushabh Gujarathi Rushabh Gujarathi · Accepted Answer · 2021-07-30T09:39:01

In an ideal case number of consumers should be equal to number of partitions in your Kafka, if this ratio is not one to one, you will get an imbalance.

Case where number of partitions > number of consumers:- Few consumer might consume from more than one partition.

Case where number of partitions < number of consumers :- Few consumer will remain idle.

Question about multiple spark streaming jobs consume one kafka topic with same group id

1 Answers