2
votes

I have encountered a weird sqs situation that I can't find a satisfying answer. I created a delay queue that should delay (what a surprise) incoming events for 4 seconds and then they should be processed by lambda. Order is not an issue here. The issue though is that the "approximate age of the oldest message" metric (stat. Max) sometimes it reaches over 1 minute which is weird since there aren't so many message as you can see in the picture. My expectation would be that the event gets processed immediately after the 4 secs delay time. The reserved concurrency level of that lambda is 50 so the sqs poller should have no problem invoking more lambda instances if there is too much traffic. But traffic isn't really a problem as you can see.

The queue is configured like this:

Default visibility timeout: 120 sec
Delivery delay: 4 sec
Dead-letter-queue: No (It is only one event generated by aws, so no
bad pills)
Message retention period: 4 days

SQS Metrics

The lambda config:

Batch size: 5 (Tried also 1 or 10. Not much of a difference for the mentioned metric)

Batch window: None

reserved concurrency: 50

timeout: 20 secs Lambda Metrics I can't explain the reason for those old messages (ApproximateAgeOfOldestMessage). Any help would be highly appreciated

Best

Patrick

1

1 Answers

1
votes

I contacted the AWS Support. Apparently it is a bug on the aws side:

Response from AWS Support:

I have just received an update from the backend service team and the team has confirmed that they have identified an issue of unexpected spikes in "ApproximateAgeOfOldestMessage" metrics that triggers when messages are sent to SQS with a configured delay. This issue's root cause is that our internal system uses recently processed delayed messages to calculate the "ApproximateAgeOfOldestMessage," which results in a higher than the actual value for "ApproximateAgeOfOldestMessage" metrics. They have now identified a fix for this issue and will start deploying the fix soon. After this update, when messages are sent to Amazon SQS with a configured delay, you may see the "ApproximateAgeOfOldestMessage" metrics value come down for the queues to the accurate value.

So if you encounter the same problem you have to wait for that mentioned fix. Hope it will come soon.