2
votes

I have a few AWS Lambda functions, but the troubleshooting is for one of them. this Lambda function is triggered by message queue, read DynamoDB, process, write DynamoDB. it is called up to 10 requests per second and I have set Lambda provision concurrency. Average Lambda duration is 60 ms which I am very happy with. But every day there are around 10 instances which Lambda function duration is more than 1 second up to 3 second timeout.

I put log in my Lambda, during duration spikes, read/write (getitem/putitem) DynamoDB took more than 1 second. Dynamodb is set to on-demend. it is a very simple table, two columns, ID (auto number) and a json string(about 1KB). I have tried Redis, but weird enough, still had spikes. Lambda is not put in VPC. Dynamo connection has been set to http timeout 500, max retry to 2.

Code to read DynamodDB:

code to read dynamodbo

Log for Duration:

log for duration

1
As a test, does this problem go away if you increase the Lambda function's RAM size to the max?jarmod
not entirely. with RAM size increase, I can see average duration drop. But spikes still exist, although not sure if frequency decreased. thanks.Richard
@Richard - updated the post with formatting and displayed the images by default; for future reference, actual code and the logs would be preferred over images, as text can be searched for and more easily tested out.Nimantha

1 Answers

1
votes

When using provisioned concurrency, the Lambda service would keep a set number of the underlying containers "warm" so as to minimize start up time. Since you mention that you intermittently face higher execution durations, refer to the below debugging steps which you can do:

  • Check the "Concurrent Executions" metric for the Lambda function against the "Duration" metric: If the number of instances of the function executing at a particular time is higher than the set provisioned concurrency, then that would imply that s few of these instances had cold starts causing the higher duration.

  • Enable X-Ray tracing for the Lambda function and also add X-ray instrumentation to your code: This would give a complete idea of which network call takes up too much time and also give you the cold start "init" duration (if any).