I have a Kafka cluster with 5 partitions. On scaling down to 3, leader election took place several times.
Finally only one broker became the leader for all the 3 partitions of one of my topics.
Topic: test PartitionCount:3 ReplicationFactor:3
Topic: test Partition: 0 Leader: 2 Replicas: 2,0,1 Isr: 2,1,0
Topic: test Partition: 1 Leader: 2 Replicas: 3,1,2 Isr: 2,1
Topic: test Partition: 2 Leader: 2 Replicas: 4,2,3 Isr: 2
2,1,0 are the brokers that are running.
partition 0 is available with 2, 0, 1. All the brokers are available. So, isr=2,1,0
partition 1 is available with 3, 1, 2 but 3 is removed broker. So isr=2,1
partition 2 is available with 4,2,3 but both 4,3 are removed brokers. So isr=2
Note that, only 2 has been elected as the leader. Even if we assume that it has the highest watermark among the other brokers, all the ISRs for a given partition could have been in sync, therefore all have the same offsets for a given partition (otherwise they would have been removed from the ISR).
I have waited for a lot of time (there is a time after which if one of the replicas is not up to the mark, it will be removed from ISR) but still that is the leader election.
Leaders can be evenly distributed (load balanced).
For example, partition-0 leader can be 0
partition 1 leader can be 1
partition 2 leader can be 2
Why is this not so?
Note: I did not enable unclean leader election
. It is the default value only.
If we assume that 0,1 came up after the leader election happened, why is not there a re-election then? If the ISRs are updated, ideally the leaders should also be. Isn't it?
i.e. if Kafka knows that 0,1 are up and have in-sync replicas, it SHOULD have conducted one more leader election.
Is there any specific reason why is it not so?