Unfair Leader election in Kafka - Same leader for all partitions

Question

I have a Kafka cluster with 5 partitions. On scaling down to 3, leader election took place several times.

Finally only one broker became the leader for all the 3 partitions of one of my topics.

Topic: test          PartitionCount:3       ReplicationFactor:3
Topic: test Partition: 0    Leader: 2       Replicas: 2,0,1    Isr: 2,1,0
Topic: test Partition: 1    Leader: 2       Replicas: 3,1,2    Isr: 2,1
Topic: test Partition: 2    Leader: 2       Replicas: 4,2,3    Isr: 2

2,1,0 are the brokers that are running.

partition 0 is available with 2, 0, 1. All the brokers are available. So, isr=2,1,0

partition 1 is available with 3, 1, 2 but 3 is removed broker. So isr=2,1

partition 2 is available with 4,2,3 but both 4,3 are removed brokers. So isr=2

Note that, only 2 has been elected as the leader. Even if we assume that it has the highest watermark among the other brokers, all the ISRs for a given partition could have been in sync, therefore all have the same offsets for a given partition (otherwise they would have been removed from the ISR).

I have waited for a lot of time (there is a time after which if one of the replicas is not up to the mark, it will be removed from ISR) but still that is the leader election.

Leaders can be evenly distributed (load balanced).

For example, partition-0 leader can be 0
             partition 1 leader can be 1
             partition 2 leader can be 2

Why is this not so?

Note: I did not enable unclean leader election. It is the default value only.

If we assume that 0,1 came up after the leader election happened, why is not there a re-election then? If the ISRs are updated, ideally the leaders should also be. Isn't it?

i.e. if Kafka knows that 0,1 are up and have in-sync replicas, it SHOULD have conducted one more leader election.

Is there any specific reason why is it not so?

Mickael Maison Mickael Maison · Accepted Answer · 2019-06-09T13:52:17

Kafka has the concept of a preferred leader, meaning that if possible it will elect that replica to serve as the leader. The first replica listed in the replicas list is the preferred leader. Now looking at the current cluster state:

Topic: test Partition: 0    Leader: 2       Replicas: 2,0,1    Isr: 2,1,0
Topic: test Partition: 1    Leader: 2       Replicas: 3,1,2    Isr: 2,1
Topic: test Partition: 2    Leader: 2       Replicas: 4,2,3    Isr: 2

Partition 0, broker 2 is the preferred leader and is the current leader
Partition 1, broker 3 is the preferred leader but it's not in-sync, so a random leader is picked between 2 and 1
Partition 2, broker 4 is the preferred leader but again 4 is not in-sync. Only 2 is in-sync, so it's elected.

If all your brokers were to go back in-sync, by default Kafka would re-elect the preferred leaders (or it can be forced using the kafka-preferred-replica-election.sh tool, see Balancing leadership).

If the missing brokers are not going to be restarted, you can change the replica assignments for the partitions, to balance the leadership manually using the kafka-reassign-partitions.sh tool. Just make sure you put the preferred leader as the first entry in the replicas list.

Unfair Leader election in Kafka - Same leader for all partitions

1 Answers