3
votes

I'm aware that the maximum number of active consumers in a consumer group is the number of partitions of a topic.

What's the best practice in case of slow processing consumers? How to achieve more parallelism?

An example: A topic with 6 partitions and thousands of messages per second produced from Producers. So I have at most 6 consumers in the group. Consider that processing those messages is complex and the consumers are much slower than the producers. The result is that the consumers are always behind the last offset and the lag is increasing.

In a traditional MQ system, we simply add more and more consumers to stay up to date.

How to achieve this with Kafka, since the total of the consumers in a group is at most the number of partitions? Should I:

  • Configure the topic to have more partitions allowing more consumers per group?
  • Route the message from the consumer to a traditional MQ Queue (but lose the ordering)?

What's the best practice for this situation?

1

1 Answers

4
votes

In Kafka, partitions are the unit of parallelism.

Without knowing our exact use case and requirements it's hard to come up with precise recommendations but there are a few options.

First you should really consider having more partitions. 6 partitions is relatively small, you could easily have 60, 120 or even more partitions (and the corresponding number of consumers). Suddenly the amount of work each consumers has to do is significantly reduced.

Also if your requirements allow, you can also consume at a fast rate and spread the processing of records across many workers. In solutions like this it's harder to maintain ordering but if you don't need it then you can consider it.

I'm not sure how routing messages through a MQ Queue would really help in this scenario. If you are still reading slower than writing the amount of data in the queue will grow till you have no disk space left.

Kafka is better designed to serve as buffer between your producers and consumers so just ensure you have retention limits on your topics that allow some flexibility on the consumer side without losing data.