3
votes

The dynamodb documentation says that there are shards and they are needed to be iterated first, then for each shard it is needed to get number of records.

The documentation also says:

(If you use the DynamoDB Streams Kinesis Adapter, this is handled for you: Your application will process the shards and stream records in the correct order, and automatically handle new or expired shards, as well as shards that split while the application is running. For more information, see Using the DynamoDB Streams Kinesis Adapter to Process Stream Records.)

Ok, But I use lambda not kinesis (ot they relates to each other?) and if a lambda function is attached to dynamodb stream should I care about shards ot not? Or I should just write labda code and expect that aws environment pass just some records to that lambda?

2

2 Answers

2
votes

When using Lambda to consume a DynamoDB Stream the work of polling the API and keeping track of shards is all handled for you automatically. If your table has multiple shards then multiple Lambda functions will be invoked. From your prospective as a developer you just have to write the code for your Lambda function and the rest is taken care for you.

In-order processing is still guaranteed by DynamoDB streams so with a single shard will have only one instance of your Lambda function will be invoked at a time. However, with multiple shards you may see multiple instances of your Lambda function running at the same time. This fan-out is transparent and may cause issues or lead to surprising behaviors if you are not aware of it while coding your Lambda function.

For a deeper explanation of how this works I'd recommend the YouTube video AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301). While the focus is mostly on Kinesis Streams the same concepts for consuming DynamoDB Streams apply as the technology is nearly identical.

1
votes

We use DynamoDB to process close to billion of records everyday and autoexpire those records and send to streams.

Everything is taken care by AWS and we don't need to do anything, except configuring streams (what type of image you want) and adding triggers.

The only fine tuning we did is,

When you get more data, we just increased the batch size to process faster and reduce the overhead on the number of calls to Lambda.

If you are using any external process to iterate over the stream, you might need to do the same.

Reference:

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html

Hope it helps.