0
votes

I have an Apache Ignite cluster which exposes metrics (both from cache and nodes)

Metrics are updated with the standard update time of Ignite, then, once every 5 seconds, I gather these metrics in Prometheus.

What does happen, is that some of these metrics always shows 0 value, while some else shows meaningful values. Here's an example:

cpu usage and cache size

Here, in the left graph, I query for avg(ignite_average_cpu_load) and for avg(ignite_current_gc_cpu_load) The first one is correctly shown, while the second one alsways report 0 (note: if error occurs and Prometheys does not gather anything shows a null value, hence is Ignite itself that give me that value)

In the right graph is more evident: I am inserting ~25k cache entry per second at the screenshot moment. But timing are not shown.

I activated metrics in the cache configuration (before ignition.start()) with cacheConfiguration.setStatisticsEnabled(true) and i gather them with

val clusterMetrics = ignite.cluster().forLocal().metrics()

and

val cacheMetrics = cache.localMetrics()

The node which call the cache.put IS NOT the same that stores the cache itself. Any other settings is left to default. My gathering service tick every 5 seconds.

What could be the issue?

1

1 Answers

0
votes

There is a ticket for broken average timing metrics: https://issues.apache.org/jira/browse/IGNITE-3495

I think, you've encountered the same issue.

As for the current GC CPU metric, it hasn't been notices to show inadequate values before. Maybe it's just always near 0?

What is the averaging, that you perform? Maybe it makes the resulting value always be near 0?