1
votes

I have a multi-partition topic that is consumed by multiple consumers(same group). My goal is to maximize the consuming processing, i.e. any consumer can consume messages from any partitions.

I know that it looks impossible as only one consumer can consume from a partition.

Is it possible to use the REST Proxy to achieve this? For example, polling all the Proxy consumer instances.

Thanks.

2
Any consumer can already consume from any partition... You're bounded by the number of partitions in the topic for which you run multiple applications to maximize consumption. What issues are you actually trying to solve?OneCricketeer
Thanks for your reply. I am trying avoid the situation that some consumers are idle while there are still messages in the topic. Is that possible?zwush
Yes. If you run more consumer threads than partitions, then those extras are idleOneCricketeer
Actually I don't want to have any idle consumers while there are pending messages in the topic. It is kind of like a pool of new messages consumable by all consumers.zwush
Okay. Then start as many threads or separate applications as there are partitions, and no more. I'm still not sure I understand your problem.OneCricketeer

2 Answers

1
votes

Kafka consumers, by default, are configured to consume from as many partitions as possible. If you have multiple simultaneous consumers on the same topic, using the same consumer group ID, Kafka will automatically distribute the volume across all of those consumers. This is by design, so you can scale consumption quickly by adding more consumers.

You can, optionally, instruct the kafka consumer to only consumer from specific partitions, even including down to one, but you'd have to do that explicitly.

0
votes

The best way to maximize the consuming processing is to have one consumer (same group) reading from each partition.

As improvement actions you may also review:

  • The number of partitions: you could increase them to be able to add more consumers and increase throughput
  • How messages are balanced across partitions, a bad key selection can lead to messages all coming to same partition

Also as a reminder, it is allowed only one consumer by partition and consumer group to avoid concurrency issues. What would happen if 2 consumers commit different offsets? -> You would end up reading messages twice or skipping some of them!