Apache storm kafka spout only reading from half of a topic's partitions

Question

A problem developed on our production Storm cluster that we cannot figure out or work around.

At some point it appears that the kafka spout stopped reading from half of the topic partitions. There are 40 partitions, and it is only reading from 20 of them. We cannot find any changes that we made to either the storm cluster or kafka at the time this started happening.

We changed the consumer group id and set the spout config startOffsetTime to OffsetRequest.LatestTime to try to get it reading fresh data from all partitions. It still only connects to the same 20 partitions. We've looked at the node /<topic>/<consumer_group> inside the Storm zookeeper and see only 20 partitions there.

We have verified that messages are being published to all 40 partitions.

Kafka version is 0.9.0.1,storm version is 1.1.0.

Any tips on how to debug or where to look would be greatly appreciated. Did I mention that this is happening in production? Did I mention it started a week ago, and we just noticed this morning? :(

Additional info: we found some errors in the Kafka state change log (partition 9 is one of the affected partitions and the timestamp in the log looks to be about the time that the problem started)

kafka.common.NoReplicaOnlineException: No replica for partition 
[transcription-results,9] is alive. Live brokers are: [Set()], Assigned replicas are: [List(1, 4, 0)]
[2018-03-14 03:11:40,863] TRACE Controller 0 epoch 44 changed state of replica 1 for partition [transcription-results,9] from OnlineReplica to OfflineReplica (state.change.logger)
[2018-03-14 03:11:41,141] TRACE Controller 0 epoch 44 sending become-follower LeaderAndIsr request (Leader:-1,ISR:0,4,LeaderEpoch:442,ControllerEpoch:44) to broker 4 for partition [transcription-results,9] (state.change.logger)
[2018-03-14 03:11:41,145] TRACE Controller 0 epoch 44 sending become-follower LeaderAndIsr request (Leader:-1,ISR:0,4,LeaderEpoch:442,ControllerEpoch:44) to broker 0 for partition [transcription-results,9] (state.change.logger)
[2018-03-14 03:11:41,208] TRACE Controller 0 epoch 44 changed state of replica 4 for partition [transcription-results,9] from OnlineReplica to OnlineReplica (state.change.logger)
[2018-03-14 03:11:41,218] TRACE Controller 0 epoch 44 changed state of replica 1 for partition [transcription-results,9] from OfflineReplica to OnlineReplica (state.change.logger)
[2018-03-14 03:11:41,226] TRACE Controller 0 epoch 44 sending become-follower LeaderAndIsr request (Leader:-1,ISR:0,4,LeaderEpoch:442,ControllerEpoch:44) to broker 4 for partition [transcription-results,9] (state.change.logger)
[2018-03-14 03:11:41,230] TRACE Controller 0 epoch 44 sending become-follower LeaderAndIsr request (Leader:-1,ISR:0,4,LeaderEpoch:442,ControllerEpoch:44) to broker 1 for partition [transcription-results,9] (state.change.logger)
[2018-03-14 03:11:41,450] TRACE Broker 0 received LeaderAndIsr request (LeaderAndIsrInfo:Leader:-1,ISR:0,4,LeaderEpoch:442,ControllerEpoch:44),ReplicationFactor:3),AllReplicas:1,4,0) correlation id 158 from controller 0 epoch 44 for partition [transcription-results,9] (state.change.logger)
[2018-03-14 03:11:41,454] TRACE Broker 0 handling LeaderAndIsr request correlationId 158 from controller 0 epoch 44 starting the become-follower transition for partition [transcription-results,9] (state.change.logger)
[2018-03-14 03:11:41,455] ERROR Broker 0 received LeaderAndIsrRequest with correlation id 158 from controller 0 epoch 44 for partition [transcription-results,9] but cannot become follower since the new leader -1 is unavailable. (state.change.logger)
//... removed some TRACE statements here
[2018-03-14 03:11:41,908] WARN Broker 0 ignoring LeaderAndIsr request from controller 1 with correlation id 1 epoch 47 for partition [transcription-results,9] since its associated leader epoch 441 is old. Current leader epoch is 441 (state.change.logger)
[2018-03-14 03:11:41,982] TRACE Broker 0 cached leader info (LeaderAndIsrInfo:Leader:1,ISR:0,1,4,LeaderEpoch:441,ControllerEpoch:44),ReplicationFactor:3),AllReplicas:1,4,0) for partition [transcription-results,9] in response to UpdateMetadata request sent by controller 1 epoch 47 with correlation id 2 (state.change.logger)
[2018-03-22 14:43:36,098] TRACE Broker 0 received LeaderAndIsr request (LeaderAndIsrInfo:Leader:-1,ISR:,LeaderEpoch:444,ControllerEpoch:47),ReplicationFactor:3),AllReplicas:1,4,0) correlation id 679 from controller 1 epoch 47 for partition [transcription-results,9] (state.change.logger)

Possibly caused by this bug: https://issues.apache.org/jira/browse/KAFKA-3963

But how can we recover from it?

We suspect that we might have had two consumers reading simultaneously, one starting at beginning of topic and one at end, messing up the kafka offset state. Trying to figure out how to reset that now. — ryryguy
If the state that got messed up was the offsets in Storm's Zookeeper, you might be able to get it to reset by setting ignoreZkOffsets github.com/apache/storm/blob/… — Stig Rohde Døssing
Okay. Consuming from Kafka should have no effect on the stored offsets or metadata in Kafka, so this is a little surprising. Since the partitions in Kafka's Zookeeper should be fine, you could take at look at github.com/apache/storm/blob/v1.2.1/external/storm-kafka/src/… which is the code Storm uses to read the partitions from Kafka. — Stig Rohde Døssing
I would try the Kafka mailing list as well, people there are likely to know about it if there's a bug in 0.9.0.1 that could break a topic kafka.apache.org/contact — Stig Rohde Døssing

Stig Rohde Døssing Stig Rohde Døssing · Accepted Answer · 2018-03-22T16:20:20

I'd start by looking in Kafka's Zookeeper under /brokers/topics to verify that all partitions are listed. That's where storm-kafka reads the partitions from.

Apache storm kafka spout only reading from half of a topic's partitions

1 Answers