1
votes

I use Kafka's CreateDirectStream to create consumers from my spark streaming application. when I load a single instance of my consumer app it works perfectly, however when I launch a second app with the same consumer group, one of the instances gets an exception within a few seconds: ...IllegalStateException: No current assignment for partition testTpc1-1

My consumer code is fairly straight forward:

val stream = KafkaUtils.CreateDirectStream[String,String](
ssc,
PreferConsistent,
Subscribe[String,String](topics,kafkaParams)
)

I also have enable.auto.commit set to false

How can I run multiple consumers of the same consumer group on a single kafka topic (with partitions)?

Kafka ver 0.10 Spark ver 2.2 Scala ver 2.11

UPDATE-EDIT: Reading the answer here: Spark Direct Streaming - consume same message in multiple consumers I see that direct streaming does not support multiple consumers of the same group. I guess I need to find a different solution. So my next question is - Does DirectStream offer any performance solution to compensate for the lack of ability to use multiple consumers of the group (i.e. doing the same work)?

1

1 Answers

0
votes

The whole point of Spark is to allow you to parallelize execution on your data without "manual" balancing of load. The consumer groups are designed for the opposite case - you're scaling "manually" by adding new instances into the consumer group.

The question in reality is about scalability of your Spark application - you need to add more details on what kind of processing you're doing, what performance problems you have, how many partitions in your topic, etc.