0
votes

I am new to apache storm and kafka, as part of POC I am trying to process a message stream using Kafka and apache storm. I am using storm-kafka source from https://github.com/apache/storm/tree/master/external/storm-kafka, I am able to create a sample program which reads messages from kafka topic using KafkaSpout and output it on another kafka topic. I have 3 node kafka(all three running on same server) cluster and created the topics with 8 partitions. I setup the KafkaSpout parallelism as 8 and bolt's parallelism as 8 as well, tried with 8 executors as well as task. I have tried setting up lot of tunnig parameters both at kafka level, SpoutConfig level and storm level but i am getting very high overall latency issue. I need message process garuntee so acking is really required. Storm cluster has one supervisor and zookeeper has 3 noed, it is shared between kafka and storm. It is running on Red Hat Linux machine with 144MB RAM with 16CPU. With below parameters, i ma getting very high spout process latency about 40Sec, I need to get around 50K msg/sec level, can you please help me with the configuration to achieve it. I have gone through lot of posts on various site and tried lot of tuning options with no results.

Storm config
topology.receiver.buffer.size=16
topology.transfer.buffer.size=4096
topology.executor.receive.buffer.size=16384
topology.executor.send.buffer.size=16384
topology.spout.max.batch.size=65536
topology.max.spout.pending=10000
topology.acker.executors=20

Kafka config
fetch.size.bytes=1048576
socket.timeout.ms=10000
fetch.max.wait=10000
buffer.size.bytes=1048576

Thanks in advance.

Storm UI screenshot

enter image description here

3

3 Answers

2
votes

Your topology has several problems:

  1. you should have same number of spout executors as kafka partitions
  2. your topology can't handle tuples fast enough. I am surprised on how tuples didn't start to fail by timeout. Use a reasonable value for topology.max.spout.pending, i recommend 150 or
    1. This will only prevent timeouts, your spouts will consume tuples slowly because the remainder of the topology can't handle it.
  3. You need to add more executors to your bolts, the only your topology gets faster is bring more execution units into play. Executors and threads are not the same thing, you need to put more executors in the topology. Your single executer latency is 0,097 this means your single executor can process around 10309 tuples per second; This is, to reach your goal of 50k per second you need to have at least 5 executors. I am sure that with your 16 cpu machine you can use more than 1 CPU to to work on the bolt.
  4. The main purpose of the tasks is to promote them --during rebalanced-- to executors; therefore the num tasks >= num executors.
  5. If you are using global grouping, you will need to redesign your topology to use something like the fields grouping instead.
0
votes

Looking at your UI screenshot, it seems that your spouts emit more data as can be processed by your bolt. Both spouts emitted about 500K messages, but only 250k got acked (the same can be inferred by number of executed tuples of bolt -- it's about 480K, what is half of emitted tuples from both spouts). Is the latency of 40s the same value from the beginning on? Or does latency increase over time? If it increases over time, it is clear that your bolt is the bottleneck. You have two options:

  1. increase the parallelism for bolt and/or
  2. set parameter spout.max.pending to throttle spout output rate

The first option make only sense, if you have enough cores (but this should not be a problem up to now, as you mentions 16 available CPUs). If the second options is applicable for you, depends on the throughput you want to achieve. You mentioned 50K msg/sec but the UI does not show current throughput number (ie, spout output rate), thus I cannot tell if throttling is an option. Furthermore, you must determine the best value for spout.max.pending by trial-an-error (starting with a value of 1000 seems reasonable to me).

0
votes

I don't know if your problem is solved or not, but apart from tuning topology.max.spout.pending based on your latency requirements, you also need to tune your batch size. Setting topology.spout.max.batch.size to a lower number may help reduce latency.