We have got 3 kafka brokers and topic with 40 partitions and replication factor set to 1. After uncontrolled kafka broker shutdown for some partition we see that it wasn't possible to elect new leader (see logs below). Eventually we cannot read from the topic. Please advise, if it is possible to survive such kind of crash without changing replication factor to bigger than 1.
We want to have a consistent state of our target database (created on the base on events from kafka topic) so we have also set parameter unclean.leader.election.enable to false.
Partition info after crash:
extenr-topic:1:882091242
extenr-topic:19:882091615
extenr-topic:28:882092273
Error: partition 18 does not have a leader. Skip getting offsets
Error: partition 27 does not have a leader. Skip getting offsets
Error: partition 36 does not have a leader. Skip getting offsets
Exception from kafka broker:
2017-10-09 05:56:50,302 ERROR state.change.logger: Controller 236 epoch 267 initiated state change for partition [extenr-topic,15] from OfflinePartition to OnlinePartition failed
kafka.common.NoReplicaOnlineException: No broker in ISR for partition [extenr-topic,15] is alive. Live brokers are: [Set(236, 237)], ISR brokers are: [235]
at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:66)
at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:342)
at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:203)
at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:118)
at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:115)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
There are also following errors in logs
2017-10-09 04:11:25,509 ERROR state.change.logger: Broker 235 received LeaderAndIsrRequest with correlation id 1 from controller 236 epoch 267 for partition [extenr-topic,36] but cannot become follower since the new leader -1 is unavailable.