0
votes

Help me understand how kafka work. I have 3 kafka broker with this config:

broker.id=1
listeners=PLAINTEXT://1.2.3.4:9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=105906176
message.max.bytes=105906176
replica.fetch.max.bytes=105906176
log.dirs=/opt/kafka_2.12-2.1.0/logs
num.partitions=10
num.recovery.threads.per.data.dir=10
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=2
transaction.state.log.min.isr=2
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=600000
zookeeper.connect=1.2.3.4:2181,5.6.7.8:2181,3.1.2.1:2181
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=3
auto.create.topics.enable=false
delete.topic.enable=true
replica.fetch.wait.max.ms=500
replica.lag.time.max.ms=10000
request.timeout.ms=305000

after I reboot all broker, at the same time, in 2 of 3 brokers I see this error continuously:

kafka-server-start.sh[2448]: [2020-02-10 06:01:33,969] ERROR [KafkaApi-2] Number of alive brokers '0' does not meet the required replication factor '2' for the transactions state topic (configured via 'transaction.state.log.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)

But all servers run, all kafka service start, all zookeeper start (echo stat | nc zookeeper_ip 2181 - show that all ok). Why kafka say - Number alive brokers '0'?

At one broker I didn't see any errors. When I restart kafka broker (just restart service kafka, not all server) which didn't have this error - kafka cluster start work. All kafka brokers connect to each other, start replicated partitions, all fine. Why does this happen?, why kafka brokers didn't see each other after reboot? May be when we use 3 kafka broker we can't reboot all of them at the same time?

2

2 Answers

0
votes

Did you miss this part of the message?

This error can be ignored if the cluster is starting up and not all brokers are up yet.

No, you should not reboot them all at the same time.

Why kafka say - Number alive brokers '0' ?

Because the controller probably wasn't elected yet

You should figure out which is the controller broker, then reboot that last, and after a rolling restart of the rest

If you have more brokers than replicas and enabled rack awareness, then you could get away with more than one at a time

0
votes

In the Kafka System, Zookeeper is responsible for maintaining the list of all available brokers and in case a Controller(one of the broker) gets crashed the zookeeper is responsible to elect a new controller for the cluster.

Simultaneously, if all the brokers are restarted at the same time, the Zookeeper will be unable to elect the Controller and hence will not have any information of other brokers which are alive.

As soon as u restart all the brokers at the same time, the first broker is elected as the controller whereas the rest of the brokers are still unknown of other available brokers.

Hence, while restarting the Kafka, initially it throws such an error which can be ignored.