0
votes

I am try to update my KAFKA client from 0.8.2 to 0.9.0.1 to reduce the pressure to zookeeper cluster. And I'm running into follow questions:

  1. The KAFKA consumer protocal say that "The join group request will park at the coordinator until all expected members have sent their own join group request". Then I found that the join group request is triggered by poll() and the method will not return before the group rebalancing finished. So does that means I will need as the same number of consumer thread as the consumer numbers to make sure all the consumers can send out the group join request at the same time? If I have more than 10000 partitions and I want each partition has it's own consumer, does that means I need more than 10000 consumer threads?

  2. To trigger the heart beat, I need to call poll(). But if I don't want to get new messages since the old messages is still consuming, could I do that by consumer.pause() -> consumer.poll() -> consumer.resume()? Is there a better way to do that?

1

1 Answers

1
votes

Consumers can read multiple partitions. So in general, a single consumer is sufficient -- it can assign all partitions to itself. However, if you "want that each partition has it's own consumer", you will of course need one consumer per partition...

About joining groups: if you have multiple consumers and you are in a rebalance, the rebalance will not block forever. There is a timeout applied. If a consumer does not send a join-request within the timeout, it drops out of the group (for now) and rebalance can finish. If this late consumer gets live again sending a join group request, a new rebalance will get triggered.

Pausing, poll, resume would be the right thing to do. Heads-up: this is going to get changed via KIP-62 that introduces a heartbeat background thread in the consumer.