I'm trying to measure a online mini-batch processing system with a per-second metrics (total query per second). For every batch, a metric (e.g. "stats.gauges.<host>.query.count"
) will be send to graphite. batches are processed in several different hosts in parallel and a batch of data take about 5 seconds to process.
I've tried:
- simply sum series:
sumSeries(stats.gauges.*.query.count)
, the result metrics is many times greater than the actual value; - scale to 1 second:
scaleToSeconds(sumSeries(stats.gauges.*.query.count), 1)
, the result metrics is much less than the actual value; - integral then derivative:
nonNegativeDerivative(sumSeries(integral(stats.gauges.*.query.count)))
, same as the first case ... - send gauges with
delta=True
param, then derivative. the result is about 20% greater than the actual value
so, how to get per-second metrics from batch metrics? what is the best practice?