5
votes

It's mentioned in the storm documentation, that storm replays tuple which processing has timed out. My question is if the storm do this automatically (without calling fail() on the origin spout) or is this rather responsibility of the origin spout to replay the tuple (the fail() is called and replay should be implemented inside or even somewhere externally)?

2

2 Answers

6
votes

In order to have a proper replay on a timeout, you must anchor the tuple with an id when you emit it from the spout. When the timeout occurs, whatever you used as an anchor is returned to the fail method (fail(object anchorId)). Now you can use the anchorId of the failed/timedout tuple to replay or anything else you want to do with the timeout tuple. Each anchor id must be unique. An example of an anchor id is a database id. When you tuple fails, you can use the databse id to recreate your tuple and re-emit it. So to answer your question you must have your replay logic inside the fail and you can use the anchorId to recreate your tuple. Hope this info helps

4
votes

From http://storm.apache.org/documentation/Guaranteeing-message-processing.html,

if the tuple times-out Storm will call the fail method on the Spout

So yes, fail will be called.