aws dynamodb stream lambda processes too quickly

2

votes

I have DynamoDb table that I send data into, there is a stream that is being processed by a lambda, that rolls up some stats and inserts them back into the table.

My issue is that my lambda is processing the events too quickly, so almost every insert is being sent back to the dynamo table, and inserting them back into the dynamo table is causing throttling.

I need to slow my lambda down!

I have set my concurrency to 1

I had thought about just putting a sleep statement into the lambda code, but this will be billable time.

Can I delay the Lambda to only start once every x minutes?

amazon-web-servicesaws-lambdaamazon-dynamodb

3

votes

You can't easily limit how often the Lambda runs, but you could re-architect things a little bit and use a scheduled CloudWatch Event as a trigger instead of your DynamoDB stream. Then you could have the Lambda execute every x minutes, collate the stats for records added since the last run, and push them to the table.

1

votes

I never tried this myself, but I think you could do the following:

Put a delay queue between the stream and your Lambda.

That is, you would have a new Lambda function just pushing events from the DDB stream to this SQS queue. You can set an delay of up to 15 minutes on the queue. Then setup your original Lambda to be triggered by the messages in this queue. Be vary of SQS limits though.

0

votes

No, unfortunately you cannot do it.

Having the concurrency set to 1 will definitely help, but won't solve. What you could do instead would be to slightly increase your RCUs a little bit to prevent throttling.

To circumvent the problem though, @bwest's approach seems very good. I'd go with that.

0

votes

Instead of putting delay or setting concurrency to 1, you can do the following

Increase the batch size, so that you process few events together. It will introduce some delay as well as cost less money.
Instead of putting data back to dynamodb, put it to another store where you are not charged by wcu but by amount of memory/ram you are using.
Have a cloudwatch triggered lambda, who takes data from this temporary store and puts it back to dynamodb.

This will make sure few things,

You can control the lag w.r.t. staleness of aggregated data. (i.e. you can have 2 strategy defined lets say 15 mins or 1000 events whichever is earlier)
You lambda won't have to discard the events when you are writing aggregated data very often. (this problem will be there even if you use sqs).

0

votes

As per lambda docs "By default, Lambda invokes your function as soon as records are available in the stream. If the batch it reads from the stream only has one record in it, Lambda only sends one record to the function. To avoid invoking the function with a small number of records, you can tell the event source to buffer records for up to 5 minutes by configuring a batch window. Before invoking the function, Lambda continues to read records from the stream until it has gathered a full batch, or until the batch window expires.", using this you can add a bit of a delay, maybe process the batch sequentially even after receiving it. Also, since execution faster is not your priority you will save cost as well. Less lambda function invocations, cost saved by not doing sleep. From aws lambda docs " You are charged based on the number of requests for your functions and the duration, the time it takes for your code to execute."

aws dynamodb stream lambda processes too quickly

5 Answers