Why am I able to publish messages further, even though min.insync.replicas=5
This configuration will be considered only when you set acks="-1" or acks="all"
.
Kafka Official Doc says:
When a producer sets acks to "all" (or "-1"), min.insync.replicas
specifies the minimum number of replicas that must acknowledge a write
for the write to be considered successful. If this minimum cannot be
met, then the producer will raise an exception (either
NotEnoughReplicas or NotEnoughReplicasAfterAppend )
Can I make all the followers to be in sync all the time?
As mentioned above, you can achieve it by having acks="-1" or acks="all"
and then mentioning min.insync.replicas=5
.
Is this the expected behaviour? If yes, do I not loose some messages in case leader crashes?
Yes, in case leader fails you will lose messages. acks=1
means that leader will write the record to its local log but will respond without awaiting full acknowledgement from all followers. In this case should the leader fail immediately after acknowledging the record but before the followers have replicated it then the record will be lost.
Question from comment:
But I observed difference in throughput between (acks=all, min.insync.replicas=1) and (acks=1). In acks=-1 setting, even though min.insync.replicas=1, leader is waiting for all in-sync replicas to acknowledge and hence lower throughput
From this answer on Kafka:
Also the min in-sync replica setting specifies the minimum number of
replicas that need to be in-sync for the partition to remain available
for writes. When a producer specifies ack (-1 / all config) it will
still wait for acks from all in sync replicas at that moment
(independent of the setting for min in-sync replicas).
Also, one of the useful comment:
Just reconfirmed my answer with Jun (co-author of Apache Kafka). It is
a common misconception that min.insync.replicas allows an ACK when
only a minimal subset of the ISRs get the published message. However
the “minimum” part applies to something else. The minimal value is
defining how small the list of ISRs can get and still allow writes.
ACKs are always returned when all ISR in the list get the
message.Otherwise the leader election would be much more complicated
because not all replicas would actually be in sync.
Documentation: https://kafka.apache.org/documentation/#brokerconfigs