Storm, huge discrepancy between bolt latency and total latency?

Question

Below is a screenshot of my topologies' Storm UI. This was taken after the topology finished processing 10k messages.

(The topology is configured with 4 workers and uses a KafkaSpout).

The sum of the "process latency" of my bolts is about 8100ms and the complete latency of the topology is a much longer 115881ms.

I'm aware that these sort of discrepancies can occur due to resource contention or something related to Storm internals. I believe resource contention is not an issue here; the GC didn't run at all during this test and profiling shows that I have plenty of available CPU resources.

So I assume the issue is that I am abusing Storm internals in some way. Any suggestions where to look?

Tuples must be waiting somewhere, possibly in the spouts; either waiting to be emitted to the topology or waiting to be acked when they messages have been processed?

Possibly I should adjust the number of ackers (I have set ackers to 4, the same as the number of workers)?

Any other general advice for how I should troubleshoot this?

*Note that the one bolt that has a large discrepancy between it's process and execute latencies implements the ticking bolt, batching pattern. So that discrepancy is expected.

*Edit. I suspect the discrepancy might involve the message being ack-ed by the Spout after being fully processed. If I refresh the Storm UI while it is processing, the ack-ed number for my final Bolt increase very quickly compared to the ack-ed number for the Spouts. Though this may be due to the Spout ack-ing much fewer messages than the final Bolt; a few hundred messages ack-ed by the final bolt may be representative of a single message in the Spout. But, thought I should mention this suspicion to get opinions on if it's a possibility, that the Spout's acker tasks are overflowing.

Matthias J. Sax Matthias J. Sax · Accepted Answer · 2015-11-06T07:29:25

There can be multiple reasons. First, of all you need to understand how the number are measured.

Spout Complete Latency: the time a tuple is emitted until Spout.ack() is called.
Bolt Execution Latency: the time it take to run Bolt.execute().
Bolt Processing Latency: the time Bolt.execute() is called until the bolt acks the given input tuple.

If you do not ack each incoming input tuple in Bolt.execute immediately (which is absolutely ok), processing latency can be much higher than execution latency.

Furthermore, the processing latencies must not add up to the complete latency because tuple can stay in internal input/output buffers. This add additional time, until the last ack is done, thus increasing complete latency. Furthermore, the ackers need to process all incoming acks and notify the Spout about fully processed tuples. This also adds to the complete latency.

To the problem could be to large internal buffers between operators. This could be resolve by either increasing the dop(degree of parallelism) or by setting parameter TOPOLOGY_MAX_SPOUT_PEDING -- this limits the number of tuple within the topology. Thus, if too many tuples are in-flight the spout stops to emit tuples until it received acks. Therefore, tuples does not queue up in internal buffers and complete latency goes down. If this does not help, you might need to increase the number of ackers. If the acks are not processed fast enough, the acks could buffer up, increasing the complete latency, too.

Storm, huge discrepancy between bolt latency and total latency?

1 Answers