1
votes

Im trying to optimize a 3-brokers Kafka cluster, and got into a very basic yet unclear issue.

Consider i have this configuration:

kafka.topic.partitions.number=10
kafka.partition.replications.number=3
min.insync.replicas=2

This means that :

  • 10 partition per topic.
  • Each partition should be replicated on all brokers
  • Partition is healthy if 2 in-sync-replicas (allows 1 failing broker)

The question is about the necessity of replication=3, is it really necessary ?

the value '2' will also allow:

  • 1 broker to fail
  • probably faster convergence and recovery
  • In the case of 2 failing brokers, the cluster will not function anyway.

BTW - my zookeeper installed on the same kafka machines.

Thanks .

2

2 Answers

0
votes

First of all to make it more clear:

2 in-sync-replicas means you must have 2 in-sync-replicas to produce messages, otherwise your producer gets NOT_ENOUGH_REPLICAS exception and producing message will not be possible. (I suppose you set acks=all in producer side)

For your scenario to have topic with replication.factor = 2:

Suppose that your broker ids are 1,2,3. At the beginning you have 3 healhty brokers and when you describe topic you will see something like this:

enter image description here

What if broker with id 2 is crashed? Then you you will have just one in-sycn-replica for partiton 1 and 2 as shown below:

enter image description here and your producer cannot produce messages because min.insync.replicas = 2

enter image description here

Of course you can reassign-partitions at that point but it is still a problem as you see.

As a result; best practice is to set replication.factor = 3 if your min.insync.replicas = 2 and number of brokers in cluster is 3.

0
votes

I am owning the same cluster as your's, but the difference is:

  • Zookeeper installed on other nodes than Kafka
  • I have 3 Kafka broker nodes, but having Replication Factor 2
  • I have per topic partitions is 5 Above are the differences which I noticed in your configuration.

My cluster working fine for last few months, I didn't even see a node failure in any Kafka Brokers Cluster.

The replication factor "3" is only reasonable when you are facing

  • 1 node failure frequently
  • High CPU & RAM consumption on the broker nodes
  • You are saving very important Transactional data

Otherwise, Replication Factor "2" is sufficient for almost every use-case.