0
votes

I've been tasked with monitoring a data integration task, and I'm trying to figure out the best way to do this using cloudwatch metrics.

The data integration task populates records in 3 database tables. What I'd like to do is publish custom metrics each day, with the number of rows that have been inserted for each table. If the row count for one or more tables is 0, then it means something has gone wrong with the integration scripts, so we need to send alerts.

My question is, how to most logically structure the calls to put-metric-data.

I'm thinking of the data being structured something like this...

  • Namespace: Integrations/IntegrationProject1
  • Metric Name: RowCount
  • Metric Dimensions: "Table1", "Table2", "Table3"
  • Metric Values: 10, 100, 50

Does this make sense, or should it logically be structured in some other way? There is no inherent relationship between the tables, other than that they're all associated with a particular project. What I mean is, I don't want to be infering some kind of meaningful progression from 10 -> 100 -> 50.

Is this something that can be done with a single call to the cloudwatch put-metric-data, or would it need to be 3 seperate calls?

Seperate calls I think would look something like this...

aws cloudwatch put-metric-data --metric-name RowCount --namespace "Integrations/IntegrationProject1" --unit Count --value 10 --dimensions Table=Table1

aws cloudwatch put-metric-data --metric-name RowCount --namespace "Integrations/IntegrationProject1" --unit Count --value 100 --dimensions Table=Table2

aws cloudwatch put-metric-data --metric-name RowCount --namespace "Integrations/IntegrationProject1" --unit Count --value 50 --dimensions Table=Table3

This seems like it should work, but is there some more efficient way I can do this, and combine it into a single call?

Also is there a way I can qualify that the data has a resolution of only 24 hours?

1

1 Answers

0
votes

Your structure looks fine to me. Consider having a dimension for your stage: beta|gamma|prod.

This seems like it should work, but is there some more efficient way I can do this, and combine it into a single call?

Not using the AWS CLI, but if you used any SDK e.g. Python Boto3, you can publish up to 20 metrics in a single PutMetricData call.

Also is there a way I can qualify that the data has a resolution of only 24 hours?

No. CloudWatch will aggregate the data it receives on your behalf. If you want to see a daily datapoint, you can change the period to 1 day when graphing the metric on the CloudWatch Console.