2
votes

I have a kafka topic which has 50 partitions, and the publishing message rate is 500 a second. As port pf consumer design, we can either have Multiple consumers with their own threads or Single consumer, multiple worker processing threads using concurrent listeners.

Lets say each message takes 100ms to process. If we use a single consumer and use the concurrent Message Listener container then we will end up having 50 threads in a single deployment.

If we deploy multiple applications with same group id and assign each application 10 concurrent threads then we will require 5 deployments. Re balancing can be an issue if a one of the application restarts

Can you please suggest a good approach or splitting the topic is recommended if there are too many partitions

1
I would recommend multiple application instances deployed/scaled via kubernetes, for example... That's how most large companies are doing itOneCricketeer
multiple application instances with same group id and each instance can have concurrent message listeners of 10 (for ex). Please confirm my understandingsam
Consumers are single threaded. You could use Kafka Streams API, however, where consumer threads can be configured. And yes, you have to have the same group ID to distribute processing from the same topicOneCricketeer
If we use ConcurrentMessageListenerContainer and set concurrent threads to 10 then the process will spawn 10 consumer threads and assign it o the partitions. Please correct my understandingsam
I don't use Spring, so I am not sureOneCricketeer

1 Answers

1
votes

Yes, your understanding is correct 5 instances with concurrency 10 will have one consumer thread per partition. If you deploy only 2 instances, the partitions will be distributed across 20 threads.

You can mitigate rebalances by using the recent static group membership feature.

group.instance.id

A unique identifier of the consumer instance provided by the end user. Only non-empty strings are permitted. If set, the consumer is treated as a static member, which means that only one instance with this ID is allowed in the consumer group at any time. This can be used in combination with a larger session timeout to avoid group rebalances caused by transient unavailability (e.g. process restarts). If not set, the consumer will join the group as a dynamic member, which is the traditional behavior.