I am trying to to put kafka-data through storm in hdfs and hive. I am working with hortonworks. Therefore i have the following structure, as (a little modificated) seen in many tutorials (http://henning.kropponline.de/2015/01/24/hive-streaming-with-storm/):
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("kafka-spout", kafkaSpout);
builder.setBolt("hdfs-bolt", hdfsBolt).globalGrouping("kafka-spout");
builder.setBolt("parse-bolt", new ParseBolt()).globalGrouping("kafka-spout");
builder.setBolt("hive-bolt", hiveBolt).globalGrouping("parse-bolt");
I send the kafka-spout data directly to hdfs-bolt, which is working when i only use hdfs-bolt. When i add the parse-bolt to parse the kafka-data and emit it to hive-bolt, the complete system goes crazy. Even when iam just sending one single message over kafka, this message is duplicated by the kafka-spout infinite times and is written to the hdfs infinite.
If there is an error in the parse-bolt, shouldn't the hdfs-bolt still working normal? I'am new to the topic, can someone see a simple beginners mistake? I am grateful for any advice.