3
votes

Right now Storm Spouts have an open method to configure them and Bolts have a prepare method. Is there any way to make all the Spout instances wait for all the prepare methods on the Bolts listening to them to finish?

I have a case where I would like to pass some config info to the bolts on the fly (since this config info changes all the time). I've read in some places that we should use Zookeeper or an in-memory key-value storage like redis to do this. My worry though is, what happens if the Bolts aren't ready to process data from Spouts yet, and the Spouts start emitting tuples? Is there a way to make the Spouts wait for an update from the Bolts saying they're ready?

3
One options is to use storm's Utils class in nextTuple() method of spout, Utils.sleep(ms) only for the first time (not for each tuple). - Ajeesh

3 Answers

2
votes

I found a slightly more elegant solution for this (I think). The problem was that certain bolts needed config info in order to process incoming tuples. I figured out Storm's capability to replay tuples, so now my bolts listen for updates from one spout, and tuples from the other. As long as I dont receive updates, I keep failing the tuples and having the spout replay them after a configurable amount of time.

0
votes

Yes, you can use Redis to store your configuration then read it from the prepare method.

The prepare method is invoked by the worker process which start processing tuples after finishing. Actually, I think that no tuple is emitted until all components of a worker process are ready. http://nathanmarz.github.io/storm/doc-0.8.1/index.html

Finally, you can have an additional spout which look up for configuration changes. Then, if a newer configuration is available it is send to your bolts via named streams.

0
votes

You don't have to worry about this. Storm framework loads Bolt before Spout. Storm loads the bolts in reverse order. Bolts towards the end of the topology are loaded before the bolts in the middle of the topology and in the end, Spout gets loaded.