I am currently evaluating Apache Storm to process heterogeneous data from multiple data sources. While there may be some common properties shared by all data (i.e., a "type" property), I would like to be able many different "classes" of tuples and also be able to handle new data types with minimal changes to the topology. To give an example what these data types might look like:
{type=LogTransaction,timestamp=...,user=...,duration=...}
{type=LogEvent,timestamp=...,user=...,message=...}
The examples on the Storm page primarily deal with simple Tuples which are well-defined in advance so that the spouts / bolts can statically declare the output fields.
My initial idea was to declare the type and store all other properties in a Map<String,Object>
, which could then be declared:
public void declareOutputFields(OutputFieldsDeclarer ofd) {
ofd.declare(new Fields("type", "attributes"));
}
However, I believe at that point many of the more advanced features of Storm will no longer work correctly. For example, it it my understanding that I could no longer use Trident to execute a groupBy
on any of the attributes.
Is there a better way to handle this type of data that I have missed in Apache Storm? I did find this post describing a similar issue, however I would like to avoid having to create a Java class for each data type.