1
votes

Could anyone suggest the best pattern of gathering metrics from a cluster of nodes (each node is Tomcat Docker Container with Java app)?

We're planning to use ELK stack (ElasticSearch, Logstash, Kibana) as a visualization tool but the question for us is how metrics should be delivered to Kibana?

We're using DropWizard metrics library and it provides per instance metrics (gauges, timers, histograms).

Some metrics, obviously, should be gathered per instance (e.g. cpu, memory, etc..) - it doesn't make any sense to aggregate them per cluster.

But for such metrics as an average API response times, database calls durations we want a clear global picture - i.e. not per concrete instance.

And here is where we hesitating. Should we:

  1. Just send plain gauge values to ElasticSearch and allow Kibana to calculate averages, percentiles, etc.. In this approach all aggregation happens in Kibana.
  2. Use timers and histograms per instance and send them instead - but since this data is already aggregated per instance (i.e. timer already provides percentiles and 1minute, 5minute and 15minute rates) - how should Kibana handle this to show a global picture? Does it make a lot of sense to aggregate already aggregated data?

Thanks in advance,

1

1 Answers

2
votes

You will want to use Metricbeat. It supports modules for the system level, Docker API, and Dropwizard. This will collect the events for you (without any pre-aggregation).

For the aggregation and visualization I'd use the time-series visual builder, where you can aggregate per container, node, service, everything,... It should be very flexible to get the right data granularity for you.