I am using prometheus and grafana to monitor some servers. One of the metrics I have exposed is called recent_tables
, which contains the number of assets who have written to sql tables in the past 15 minutes (machines automatically post to sql). It's labels are table
, job
, and status_code
. I also have metric online_assets
, which has the amount of assets that are online. Its labels are cluster_id
, db_host
, and job
.
I am trying to make an alert for when < 90% of online assets have written to sql tables recently. Before I write the alert, I am trying to get a panel in grafana to populate the data and eventually transition this to an alertmanager expr. The following queries do not work, and I don't understand why:
recent_tables < online_assets * 0.9
sum(recent_tables) by (table) < online_assets * 0.9
However, the following query works:
sum(recent_tables{table="<table>"}) - sum(online_assets)
I do not want to have to make an alert based on every table (this is possible through ansible), but I would like to understand if there is a way to get multiple vectors out of the same query.
live_assets{cluster_id="east", db_host="pgserver1", job="live_asset_count"} 2000
. recent_tables has labels job, status_code, and table. Ex;recent_tables{table="user_accounts", status_code=200, job="recent_user_accounts"} 1500
– py_guy_5ON()
). And if there are more elements on one side, you have to indicate it withgroup_left
orgroup_right
. – Michael Doubezsum(recent_tables) by (table) - ignoring(table) group_left() sum(live_assets) * 0.9 < 0
. I am sure this can be simplified but this will do for now. I was missing the fact that metrics with uneven labels could not be merged. Thank you! – py_guy_5