Why do a large number of partitions affect performance of a Kafka cluster? What are the best practice to manage and monitor partitions? What is the best practice on partition count in a cluster?
1 Answers
The kafka controller is responsible to track and update the cluster status to all brokers in the cluster. The controller needs to do more work when the # of partition increases. The controller needs to broadcast kafka topic metadata information to all other brokers. A larger number of partitions means the controller needs to send more data through network.
The # of partitions that a cluster can host depends on the cluster settings. A cluster with more powerful hosts will be able to host more topic partitions. You can monitor # of partitions on your cluster, partition distribution among brokers, and the system metrics (CPU, I/O, network etc.) to see the # of partitions that fit for your setting. We have seen issues after hosting >4000 topic partitions on one host. Generally it is a good practice to keep # of partition replicas under 1000 per host. We can also check controller log to see if there is any topic metadata update failures.