8
votes

I am using Storm (java) with Cassandra.

One of my Bolts inserts data in to Cassandra. Is there any way to hold the connection to Cassandra open between instantiations of this bolt?

The write speed of my application is fast. The bolt need to run several times a second, and the performance is being hindered by the fact that it is connecting to Cassandra each time.

It would run a lot faster if I could have a static connection that was held open, but I am not sure to achieve this in storm.

To clarify the question:

what is the scope of a static connection in a storm topology?

Unlike other messaging systems which have workers where the "work" goes on in a loop or callback which can make use of a variable (maybe a static connection) outside this loop, storms bolts seem to be instantiated each time they are called and can not have parameters passed in to them, so how can I use the same connection to cassandra?

1
this question has been voted down with no comments, that is ridiculous, please ask me for more info if that is what is required. - girlcoder
thank you who ever voted it back to 0 - girlcoder
Most of Cassandra drivers/client support connection pooling. I don't think you need to keep your connection open. - Chiron
as I understand it, this is not what a connection pool is used for. If I have a client application, and I am actually calling connect and disconnect, a connection pool will not have any affect on this. - girlcoder
as an extension to the question : when and how to you close a connection to a database in a storm topology ? e.g. if I launch a topology and it if runs for say for 6 months will it just keep the connection open without releasing resource ? - Albatross

1 Answers

9
votes

Unlike other messaging systems which have workers where the "work" goes on in a loop or callback which can make use of a variable (maybe a static connection) outside this loop, storms bolts seem to be instantiated each time they are called and can not have parameters passed in to them

Its not exactly right to say that storm bolts get instantiated each time they called. For example the prepare method only get called during the initialization phase i.e only once. from the doc it says
it is Called when a task for this component is initialized within a worker on the cluster. It provides the bolt with the environment in which the bolt executes.

So the best bet would be to put the initialization code in the prepare or open (in case of spouts) method as they will be called when the tasks are starting. But you need make it thread safe as it will be called by every tasks concurrently in its own thread.

The execute(Tuple tuple) method on the other hand is actually responsible for processing the logic and called every time it receives a tuple from the corresponding spouts or bolts.(so this is actually what get called every single time the bolt runs)


The cleanup method is called when an IBolt is going to be shutdown, the documentation says

There is no guarentee that cleanup will be called, because the supervisor kill -9's worker processes on the cluster.The one context where cleanup is guaranteed to be called is when a topology is killed when running Storm in local mode

So its not true that you can't pass a variable to it, you can instantiate any instance variables with the prepare method and then use it during the processing.

Regarding the DB connection I am not exactly sure about your use cases as you have not put any code but maintaining a pool of resource sounds like a good choice to me.