1
votes

I'm using a Storm Topology (with Storm 0.10.0) with the default KafkaSpout to fetch JSON data coming from a Kafka topic and process it.

Sometimes the topology peaks at 500k messages per sec without any issue, but usually stays at ~10k messages per sec.

Usually I don't have any performance issues but after a variable amount of time, the Spout shows a few failed tuples and the topology output gets slower.

I've already double checked that all tuples that reach the Bolts get acked and there aren't any errors showing on the logs.

Any idea why this happens? Any extra information I can provide that helps debug this problem?

1
Have you found a solution to this problem ? - Daniccan
Have you found solution this issue ? - user608020
Can't say that I did. I was using the newer versions of the KafkaSpout, which do not use Zookeeper. My solution was to go back to Zookeeper and everything worked fine. Hopefully newer versions will not have the problem, but I haven't tested. - fbexiga

1 Answers

0
votes

Upgrade your Storm application to latest version, there's lot of features added in Kafka/Storm.

  1. Configure optimal threads/executors for spout & bolts.
  2. Check Latency of bolts/spouts, if more than 1 sec, there's some bottle neck (Profile spout/bolt code).
  3. Kafka topic may contain more than 100 million messages (Reduce retention period).
  4. Check Storm worker memory (Optimal 2GB to 4GB).
  5. Implement topology.max.spout.pending property with start value as 1000 & increase accordingly.