0
votes

I am using prometheus and grafana to monitor some servers. One of the metrics I have exposed is called recent_tables, which contains the number of assets who have written to sql tables in the past 15 minutes (machines automatically post to sql). It's labels are table, job, and status_code. I also have metric online_assets, which has the amount of assets that are online. Its labels are cluster_id, db_host, and job.

I am trying to make an alert for when < 90% of online assets have written to sql tables recently. Before I write the alert, I am trying to get a panel in grafana to populate the data and eventually transition this to an alertmanager expr. The following queries do not work, and I don't understand why:

recent_tables < online_assets * 0.9

sum(recent_tables) by (table) < online_assets * 0.9

However, the following query works:

sum(recent_tables{table="<table>"}) - sum(online_assets)

I do not want to have to make an alert based on every table (this is possible through ansible), but I would like to understand if there is a way to get multiple vectors out of the same query.

1
In order to make an informed answer, we would need the labels of each metric. Best would be a sample.Michael Doubez
Hi Michael, live_assets have cluster_id, db_host, and job labels. Ex; live_assets{cluster_id="east", db_host="pgserver1", job="live_asset_count"} 2000. recent_tables has labels job, status_code, and table. Ex; recent_tables{table="user_accounts", status_code=200, job="recent_user_accounts"} 1500py_guy_5
AFAIS, there is no way to relate a table with the corresponding number of assets. What do you want to compare and on which dimension ? You have to indicate to the operators how to match elements (using ON()). And if there are more elements on one side, you have to indicate it with group_left or group_right.Michael Doubez
You were exactly right, I wish I could upvote this response. The query I needed was: sum(recent_tables) by (table) - ignoring(table) group_left() sum(live_assets) * 0.9 < 0. I am sure this can be simplified but this will do for now. I was missing the fact that metrics with uneven labels could not be merged. Thank you!py_guy_5

1 Answers

0
votes

As Michael Doubez pointed out, you cannot have unbalanced label dimensions when making queries.

I ended up with the following: sum(recent_tables) by (table) - ignoring(table) group_left() sum(live_assets) * 0.9 < 0

This accounts for the mismatch in dimensions but there may be a cleaner way.