5
votes

I'm having an odd issue with Apache Storm. I have a KafkaSpout hooked up to a Kafka cluster with 10 messages in it.

The Bolts receive each message and process them correctly because in the Storm UI they are listed as 'acked'. However, the Spout listed under the storm UI says that all of the tuples failed.

I believe this causes the spout to re-emit all of the messages again... So I am seeing a Storm Bolt print out messages 1-10 and then print them out in the same order over and over and over.

I am calling .ack() and .fail() methods appropriately, I just don't know why the Spout would be listing them as failed.

Any thoughts?

2
could you please share how you are acking the tuple while emitting - user2720864
@user2720864 Thanks for the comment - it was indeed an issue of not acking correctly. - joshft91

2 Answers

3
votes

It turns out that a couple bolts downstream were not acking when they finished processing a tuple. This caused the spout tuple to fail and ultimately send the tuple again which resulted in a continuous loop.

2
votes

When the spout reads a message, and passes it to the bolts, the message should complete full processing (all relevant bolts) within TOPOLOGY_MESSAGE_TIMEOUT_SECS / "topology.message.timeout.secs"

All relevant bolts must ack, and then the acker indicates to the spout that the message was processed (in case of kafka spout, the spout will then increment the offset).

If you see in the logs SPOUT Failing, perhaps:

  1. One of your bolts failed the message
  2. One of your bolts did not ack
  3. The bolts did not complete handling the message within topology.message.timeout.secs, so an ack was not sent on time.

Example of #3: if you have 5 bolts, each takes about 10 seconds due to db connection issues, so after bolt #3 you will pass the default 30sec storm timeout, and fail to process the message. The spout will then replay this message again.

So either you raise the timeout configuration, or fail faster (for example: shorter db connection timeout), or sometimes lowering the TOPOLOGY_MAX_SPOUT_PENDING can also help in case lots of messages are waiting to be processed, and earlier messages takes long time.

See apache - Guaranteeing Message Processing for more.