1
votes

I currently have some alerts set up to report when subscription/pull_request_count is 0. However, in a similar question about that metric, I found that metrics and alerting break once there is no data, which I believe happens when there are no subscriptions.

My intent is to figure out if my servers have stopped pulling messages. There are 2 scenarios I have in mind where the details are important.

  1. Even if there are no messages being published, I want to know if I'm no longer pulling from a subscription to make sure things are working properly.
  2. In the event that a ton of unacknowledged messages are queued up just because I pulled them but didn't ack them (e.g. a partner API was down), I don't want this alert to be triggered

Besides using subscription/pull_request_count as a condition, which won't work when no data is coming in (at least after a while), how can I set up an alert that notifies me that there no clients pulling from a Pub/Sub subscription?

1
Do you really want to check if there are no clients pulling from a Pub/Sub subscription, or do you just want to check if there are undelivered messages because there is no client pulling from the subscription (no client => undelivered messages)?norbjd
Good question. My intent is to figure out if my servers have stopped pulling messages. There are 2 scenarios I have in mind where the details are important. 1) Even if there are no messages being published, I want to know if I'm no longer pulling from a subscription to make sure things are working properly. 2) In the even that a ton of unacknowledged messages are queued up just because I pulled them but didn't ack them (e.g. a partner API was down), I don't want this alert to be triggered.jon_wu
Ok, so based on your 2 scenarios, you sadly can't use another metric like num_undelivered_messages. You should include these use-cases in your question because there are very relevant! Anyway, it seems that subscription/pull_request_count is the metric you want to use here. Have you found why this metric does not generate data sometimes? Could you clarify what "alerting break once there is no data" means? Is the alert triggered? How does your alert is configured now? Please add these information to your question, it may help other people to find out what is going on.norbjd
I've added these examples to the question. For details on alerting breaking, see the link in my question to a recent question I posted, where GCP support mentioned the metric meant there is missing data.jon_wu
Why don't you use is absent as the policy condition instead of is under 1? Minimum seems to be 3 min though.Guillem Xercavins

1 Answers

1
votes

As you want to be alerted when there are no pull message operations you'll have to use the subscription/pull_request_count metric. If, after some time, the metric is dropped instead of reporting 0 pulls you can use two conditions: is absent for 3 minutes OR is below 1 for 1 minute:

enter image description here

However, the problem here is that the UI filters out all unused resources and metrics (for the past 6 weeks). While this greatly eases out setting alerts and browsing through metrics for running operations it requires a different approach to create new alerts before a system is in production. The easiest solution is to make a dummy subscription and pull messages so that the metric appears.

But you can still use the Stackdriver Monitoring API to set them up (I actually tested this with a Spanner metric in a workspace with no instances for the last few months). Keep in mind that the alerting policies API is in Beta so it's subject to non-backwards-compatible changes.

I'd recommend to start by inspecting an already existing policy with projects.alertPolicies/list and see how the AlertPolicy body is constructed.

Then you can set some initial variables:

TOKEN="$(gcloud auth print-access-token)"
PROJECT=$(gcloud config get-value project 2>\dev\null)
SUBSCRIPTION=PUBSUB_SUBSCRIPTION_ID
CHANNEL=NOTIFICATION_CHANNEL_ID

In my case I am monitoring only a specific Pub/Sub subscription throughout the example and I already had a notification channel (for my email). As you also have an existing policy you can get the notification channel ID here.

With projects.alertPolicies/create you can create the new alert policy:

curl -X POST \
    -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  "https://monitoring.googleapis.com/v3/projects/$PROJECT/alertPolicies" \
  -d @alert.json

where alert.json is (replace the variables as needed):

{
  "displayName": "no-pull-alert",
  "combiner": "OR",
  "conditions": [
    {
      "conditionAbsent": {
        "filter": "metric.type=\"pubsub.googleapis.com/subscription/pull_request_count\" resource.type=\"pubsub_subscription\" resource.label.\"project_id\"=\"$PROJECT\" resource.label.\"subscription_id\"=\"$SUBSCRIPTION\"",
        "duration": "180s",
        "trigger": {
          "count": 1
        },
        "aggregations": [
          {
            "alignmentPeriod": "60s",
            "perSeriesAligner": "ALIGN_RATE"
          }
        ]
      },
      "displayName": "Pull requests absent for $PROJECT, $SUBSCRIPTION"
    },
    {
      "conditionThreshold": {
        "filter": "metric.type=\"pubsub.googleapis.com/subscription/pull_request_count\" resource.type=\"pubsub_subscription\" resource.label.\"project_id\"=\"$PROJECT\" resource.label.\"subscription_id\"=\"$SUBSCRIPTION\"",
        "comparison": "COMPARISON_LT",
        "thresholdValue": 1,
        "duration": "60s",
        "trigger": {
          "count": 1
        },
        "aggregations": [
          {
            "alignmentPeriod": "60s",
            "perSeriesAligner": "ALIGN_RATE"
          }
        ]
      },
      "displayName": "Pull requests are 0 for $PROJECT, $SUBSCRIPTION"
    }
  ],
  "documentation": {
    "content": "**ALERT**\n\nNo pull message operations",
    "mimeType": "text/markdown"
  },
  "notificationChannels": [
    "projects/$PROJECT/notificationChannels/$CHANNEL"
  ],
  "enabled": true
}

Briefly, you don't need to pass policy or condition IDs as those will be populated by the API. Use OR as the combiner (policy violates when ANY condition is met) to trigger the alert when the metric is either absent (conditionAbsent) or below 1 (conditionThreshold). And, of course, you can modify parameters to better suit your use case, display names, descriptions, etc.

enter image description here