We are running a 16 nodes kafka cluster on AWS, each node is a m4.xLarge EC2 instance, with 2TB EBS(ST1) disk. Kafka version is 0.10.1.0, we have about 100 topics at the moment. Some busy topics will have about 2 billion events every day, some low volume topics will only have thousands per day.
Most of our topics use an UUID as the partition key when we produce the message, so the partitions are quite evenly distributed.
We have quite a lot consumer consume from this cluster using consumer group. Each consumer has a unique group id. Some consumer group commit offsets every 500ms, some will commit offsets in sync as soon as it finishes processing a batch of messages.
Recently we observed a behaviour that some of the brokers are far busier than the others. With some digging, we find out, it is actually quite a lot traffic go to "__consumer_offsets", thus we created a tool to see the high watermark of each partitions in "__consumer_offsets", which reveal that the partitions are very uneven distributed.
Based on this link "Consumer offset management in Kafka"
It seems it is an intended behaviour, each consumer group only have one leader, thus committed offsets all need to go to this leader, and also only use “group.Id” to decide the partition.
Given the fact that we have some consumers consume from those very busy topics, thus the commit offsets will cause a lot traffic to "__consumer_offsets" topic on the broker that handle the consumer group.
My questions are :
1. Is there a way we can make sure that the consumer groups that consume from busy topics doesn't fall on to the same broker? Don’t' want to create a hotspot.
- For consumers that consumer from busy topics (topics have billions messages per day), is it a good idea to use consumer group?
Thanks in advance