2
votes

I want to build a HA Kafka cluster, where the cluster needs to span 2 availability zones.

I want to be able to continue to read and write to a topic, even if all the brokers in an AZ go down.

If I have at least 2 brokers in each AZ, a replication factor of 3, min ISR of 2 and acks set to All, then I think that a producer write will be acked when one other broker other than the leader also acks the write. Does the rack aware algorithm enforce that the ISR must be located in the other AZ? The docs just mention replicas, not the ISR.

Will this enable me to continue reading and writing in the event of the loss of an AZ? If not, what is needed to achieve this?

1
Rack awareness applies to replicas only. If your brokers 0 and 1 are in one AZ and 2 and 3 are in the other, the order with rack-awareness enabled would be something like 0,2,1,3 for selecting what broker should host replicas. This guarantees there are replicas in both AZs. I think with min.isr = 2, replication = 3, #racks = 2 it's possible to have only two isr's and both in the same rack (AZ), and with that AZ failing you may lose messages (if unclean leader election is enable) or availability (partition goes offline since the leader is gone).vahid

1 Answers

1
votes

If you want a true HA Kafka cluster you need to start with an HA Zookeeper ensemble which typically means 3 Availability Zones because (unlike Kafka brokers) Zookeeper nodes need a quorum (a majority of the original nodes) to operate and you can’t have a majority when half of your nodes are down.

The reason Zookeeper is important is that a proper HA Kafka cluster should not just allow reads and writes after a failure, but also allow new topic creation and new leader elections, both of which require Zookeeper to be operational.