0
votes

I've started using collectd (5.5.1), statsd (git head), influxd (1.2), and grafana to monitor my infrastructure. The collectd portion is working fine for blackbox monitoring. We want to use statsd for whitebox monitoring.

The setup we have is collectd and statsd on each host, sending data to influxdb on a monitoring host. Influxdb and grafana run on the monitoring host. Of course, collectd and statsd run on the monitoring host, too, just as they do elsewhere.

My statsd config on each host is simply

{
    graphitePort: 2003,
    graphiteHost: "monitor.example.com",
    port: 8125,
    backends: [ "./backends/graphite" ]
}

This is probably not ideal in any case, as I just discovered that there's an influxdb backend available, but I expect the above to work even if I can do better.

I have the following problems, however:

  1. statsd is not forwarding host information to the monitor host.

  2. I've understood that one of the benefits of statsd is that it can aggregate data on the local host before sending it at specified flush intervals. But I don't see in the documentation where to specify, say, which metrics get summed, which to send max, which to send what percentiles, etc.

These components have evolved so rapidly in the past year that quite a lot of documentation and tutorials are out of date, so I'm quite aware I may have done some things that are incorrect simply by having read the wrong documentation.

I've also recently discovered telegraph (to run on each host?). Perhaps I have the wrong expectations of statsd (or should use telegraph instead)?

I'll happily make this question more specific in response to feedback. I'm aware that I'm still struggling with some concepts.

Many thanks for pointers.

1

1 Answers

0
votes

This is a solution without precisely being an answer.

Use telegraf instead

I discovered that telegraf is now a very viable contender in this space, is well supported, sees active development, and talks easily to influxdb. Telegraf also supports additional tags and flags per message, unlike statsd which appears to be simple key-value. In addition, telegraf removes the need for both statsd and for collectd, so it drops by one the number of moving parts. That's a good thing.

About statsd

(TL;DR - maybe I'm lame)

I couldn't figure out how to forward host information at all with statsd.

I didn't find good documentation on how to configure aggregation (though I remember having found it once). The module for sending the data using the influxdb protocol seemed not to be well maintained (relative to influxdb development).

On both of these points I may be entirely wrong, and alternative answers in the interests of documenting these things is most welcome.