How can lambda be used to keep DynamoDB and Cloud Search in sync

Question

Assuming we're using AWS Triggers on DynamoDB Table, and that trigger is to run a lambda function, whose job is to update entry into CloudSearch (to keep DynamoDB and CS in sync).

I'm not so clear on how Lambda would always keep the data in sync with the data in dynamoDB. Consider the following flow:

Application updates a DynamoDB table's Record A (say to A1)
Very closely after that Application updates same table's same record A (to A2)
Trigger for 1 causes Lambda of 1 to start execute
Trigger for 2 causes Lambda of 2 to start execute
Step 4 completes first, so CloudSearch sees A2
Now Step 3 completes, so CloudSearch sees A1

Lambda triggers are not guaranteed to start ONLY after previous invocation is complete (Correct if wrong, and provide me link)

As we can see, the thing goes out of sync.

The closest I can think which will work is to use AWS Kinesis Streams, but those too with a single Shard (1MB ps limit ingestion). If that restriction works, then your consumer application can be written such that the record is first processed sequentially, i.e., only after previous record is put into CS, then the next record should be processed. Assuming the aforementioned statement is true, how to ensure the sync happens correctly, if there is so much of data ingestion into DynamoDB that more than one shards are needed n Kinesis?

Tom Melo Tom Melo · Accepted Answer · 2017-08-07T01:18:42

You may achieve that using DynamoDB Streams:

DynamoDB Streams

"A DynamoDB stream is an ordered flow of information about changes to items in an Amazon DynamoDB table."

DynamoDB Streams guarantees the following:

Each stream record appears exactly once in the stream.
For each item that is modified in a DynamoDB table, the stream records appear in the same sequence as the actual modifications to the item.

Another cool thing about DynamoDB Streams, if your Lambda fails to handle the stream (any error when indexing in Cloud Search for example) the event will keep retrying and the other record streams will wait until your context succeed.

We use Streams to keep our Elastic Search indexes in sync with our DynamoDB tables.

How can lambda be used to keep DynamoDB and Cloud Search in sync

2 Answers