I've started using collectd (5.5.1), statsd (git head), influxd (1.2), and grafana to monitor my infrastructure. The collectd portion is working fine for blackbox monitoring. We want to use statsd for whitebox monitoring.
The setup we have is collectd and statsd on each host, sending data to influxdb on a monitoring host. Influxdb and grafana run on the monitoring host. Of course, collectd and statsd run on the monitoring host, too, just as they do elsewhere.
My statsd config on each host is simply
{
graphitePort: 2003,
graphiteHost: "monitor.example.com",
port: 8125,
backends: [ "./backends/graphite" ]
}
This is probably not ideal in any case, as I just discovered that there's an influxdb backend available, but I expect the above to work even if I can do better.
I have the following problems, however:
statsd is not forwarding host information to the monitor host.
I've understood that one of the benefits of statsd is that it can aggregate data on the local host before sending it at specified flush intervals. But I don't see in the documentation where to specify, say, which metrics get summed, which to send max, which to send what percentiles, etc.
These components have evolved so rapidly in the past year that quite a lot of documentation and tutorials are out of date, so I'm quite aware I may have done some things that are incorrect simply by having read the wrong documentation.
I've also recently discovered telegraph (to run on each host?). Perhaps I have the wrong expectations of statsd (or should use telegraph instead)?
I'll happily make this question more specific in response to feedback. I'm aware that I'm still struggling with some concepts.
Many thanks for pointers.