0
votes

I have configured aurora read replica auto scaling on my RDS cluster with a target average cpu utilisation of 60%, scale in enabled and a scale in/out period of 300 seconds. The minimum capacity of the cluster is 1 and the maximum is 2.

The replicas seem to scale out as expected, although the auto scaling activity always fails with Tried to add read replica(s) <ID>. Failed to determine if read replica(s) have been added by rds. Reason: One or more of the created DB instances transitioned to states other than 'available'. even though the replica works.

However the replicas never get removed. Once auto scaling adds one, it sits there forever, and I suspect this is to do with the fact RDS thinks the replica failed to add... but I have no idea how to fix it. The closest I've found as a cause is table locks but I need to use table locks in my application and it needs to scale during the hours it's used the most.

1
Note that your terminology seems reversed. "Scale Out" means adding capacity ("out" is used to mean "becoming more broad"), and "Scale In" means reducing capacity ("in" is used to mean "becoming more narrow").Michael - sqlbot
@Michael-sqlbot Thanks michael, I didn't realise. I've corrected it nowNbody Nbody
Honestly, this seems like it could be a defect or weakness in RDS -- an unhandled edge case -- rather than a misconfiguration on your part. Based on that, I think the official forum is the place for this question, and it looks like you posted it there. You will want to fix the body, here, where you said they "scale in just fine" -- even though the rest of the paragraph is about scaling out. I made that edit, above. Feel free to correct it and clarify if that was not your intended meaning.Michael - sqlbot

1 Answers

0
votes

This was my fault, I had deleted the cloudwatch alarm which handled the removal of RDS nodes.

AWS Support advised me to delete and re-create the auto scaling policy so the cloudwatch alarms would be recreated and it would scale properly.

The error messages I was seeing were an issue that Amazon was aware of and seem to have since disappeared by themselves.