How do I get TotalStorageSpace or UsedStorageSpace metric from AWS RDS?

votes

I see that AWS RDS provides a FreeStorageSpace metric for monitoring disk usage. Now I am trying to create a generic pre-emptive alert for all my RDS but setting up an ideal threshold on FreeStorageSpace is not making sense.

For example, 20G might be a good threshold with RDS having total disk space as 100G but might be misleading for a RDS with total disk space of 40G.

So I was wondering if there is a way to get TotalStorageSpace or UsedStorageSpace metric from RDS (~~directly~~ or indirectly).

Update

Since the fact is established that FreeStorageSpace is the only metric RDS provides related to disk storage, any ideas on if / how we can we build a custom metric for TotalStorageSpace or UsedStorageSpace?

p.s.: Creating separate alarms for each RDS for evaluating disk usage percentage seems such waste of time and resource.

amazon-web-servicesalertamazon-rdsdatadog

3 Answers

votes

According to the doc FreeStorageSpace is the only StorageSpace metrics you can get.

I can only assume that their logic is that you know what is your total space and having the FreeStorageSpace value you can also calculate how much is used.

votes

First, you can check storage-related info in the monitoring section of AWS RDS.

Now I am trying to create a generic pre-emptive alert for all my RDS but setting up an ideal threshold on FreeStorageSpace is not making sense.For example, 20G might be a good threshold with RDS having total disk space as 100G but might be misleading for a RDS with total disk space of 40G.

If there is the different storage size then you need to configure multiple alarm based on size. A generic one will not work, as it does not accept percentage.

How can I create CloudWatch alarms to monitor the Amazon RDS free storage space and prevent storage full issues?

Short Description

Create alarms in the CloudWatch console or use the AWS Command Line Interface (AWS CLI) to create alarms that monitor free storage space. By creating CloudWatch alarms that notify you when the FreeStorageSpace metric reaches a defined threshold, you can prevent storage full issues. This can prevent downtime that occurs when your RDS DB instance runs out of storage.

Resolution Open the CloudWatch console, and choose Alarms from the navigation pane. - Choose Create alarm, and choose Select metric.

From the All metrics tab, choose RDS.
Choose Per-Database Metrics.
Search for the FreeStorageSpace metric.
For the instance that you want to monitor, choose the DB instance Identifier FreeStorageSpace metric.
In the Conditions section, configure the threshold. For example, choose Lower/Equal, and then specify the threshold value.

Note: You must specify the value for the parameter in bytes. For example, 10 GB is 10737418240 bytes.

Fore more details you can check storage-full-rds-cloudwatch-alarm

votes

If you enable Enhanced Monitoring, then the RDSOSMetrics log group in Cloudwatch Logs will have detailed JSON log messages which include filesystem statistics. I ended up creating a Cloudwatch Logs metric filter to parse out the usedPercent value from the fileSys attribute for the root filesystem. At least for Postgresql, these detailed logs include both / and /rdsdbdata filesystems; the latter is the one that is of interest in terms of storage space.

You can create a metric filter of the form {$.instanceID = "My_DB_Instance_Name" && $.fileSys[0].mountPoint = "/rdsdbdata"} and a corresponding metric value $.fileSys[0].usedPercent to get the used storage percentage for a given instance. This would then be available as a Log Metric that you could use to trigger an alarm. You probably need to create another metric replacing filesystem[0] with filesystem[1] since ordering is unknown for that array. You'd probably want to create these for each RDS instance you have so you know which one is running out of space, but you question seems to indicate you don't want a per-instance alarm.

I suppose you could exclude the $.instanceID from the metric filter and just get all values written to a single metric. When it reached a threshold and triggered an alarm, you'd need to start checking to see which instance is responsible.