Google Compute Engine autoscale based on 'used' memory

Question

I'm looking to scale my Compute Engine instances based on memory which is an agent metric in Stackdriver. The caveat is that out of the 5 states that the agent can monitor(buffered, cached, free, slab, used) see the link here, I only want to look at 'used' memory and if that value is above certain %age threshold across the group(or per-instance would also work for me), I want to autoscale.

I've already installed the Stackdriver Monitoring agent in all the nodes across the Managed Instance Group and I am successfully able to visualize 'used' memory in my monitoring dashboard as I'm well acquainted with it.

Unfortunately, I can't do it for autoscaling. This is what I see when I go to configure it in the autoscaling section of MIG.

In my belief, adding filter expressions should work as expected, since this expression works correctly in the Stackdriver console using the Monitoring dashboard. Also, it's mentioned here that the syntax is compatible with Cloud Monitoring filter syntax that is given here.

I've tried different combinations for the syntax in the filter expression field but none of them has worked. Please help.

Your metric identifier is already indicate percent memory used, wouldn't it be redundant by putting the same thing in additional filter? — Wilfred L.
@WilfredL., Stackdriver can monitor 5 stages of memory(buffered, cached, free, slab and used). See link here: cloud.google.com/monitoring/api/metrics_agent#agent-memory. I only want to autoscale based on what's being 'used'. — Pranay Nanda
I tried to replicate and got the same issue, but in the document for autoscaling it never mention it can be fine tune in such a way. You might need to open a public issue tracer [1] with Google so they can better work with this. [1] issuetracker.google.com — Wilfred L.
Okay, in that case I don't really understand what the filter expression is for. — Pranay Nanda
The only syntax I've been able to use that passes the UI validation is metric.label.state="used". However when I do this, I get an error afterwards saying "Regional managed instance groups do not support autoscaling using per-group metrics.". So if you're not using a regional instance-group, it may just work for you're use-case. — Matthew

rqueue rqueue · Accepted Answer · 2021-01-30T06:22:27

I was attempting the exact same configuration in attempts to scale based on memory usage. After testing various unsuccessful entries I reached out to Google support. Based on your question I can't tell what kind of instance group you have. It matters because of the following.

TLDR

Based on input from Google support, only zonal instance groups allow the filter expression entry.

Zonal Instance Group

Only zonal instance groups will allow the metric setting. The setting you are attempting to enter is correct with metric.state=used for a zonal instance group. However, that field must be left blank for regional instance group.

Regional Instance Group

As noted above, applying the filter for a regional instance group is not supported. As noted in their documentation they mention that you leave that field blank.

In the Additional filter expression section:

For a zonal MIG, optionally enter a filter to use individual values from metrics with multiple streams or labels. For more information, see Filtering per-instance metrics.

For a regional MIG, leave this section blank.

If you add an entry you'll receive the message "Regional managed instance groups do not support autoscaling using per-group metrics." when attempting to save your changes.

On the other hand if you leave the field empty it will save. However, I found that leaving the field empty and setting almost any number in the Target Utilization field always caused my group to scale to the maximum number.

Summary

Google informed me that they do have a feature request for this. I communicated that it didn't make sense to even have the option to select percent_used if it's not supported. The response was that we should see the documentation updated in the future to clarify that point.