3
votes

I'm trying to set up ElasticSearch import process from DynamoDB table. I have already created AWS Lambda and enabled DynamoDB stream with trigger that invokes my lambda for every added/updated record. Now I want to perform initial seed operation (import all records that are currently in my DynamoDB table to ElasticSearch). How do I do that? Is there any way to make all records in a table be "reprocessed" and added to stream (so they can be processed by my lambda)? Or is it better to write a separate function that will manually read all data from the table and send it to ElasticSearch - so basically have 2 lambdas: one for initial data migration (executed only once and triggered manually by me), and another one for syncing new records (triggered by DynamoDB stream events)?

Thanks for all the help :)

1

1 Answers

2
votes

Depending on how Large your dataset is you won't be able to seed your database in Lambda as there is a max timeout of 300 seconds (EDIT: This is now 15 minutes, thanks @matchish).

You could spin up an EC2 instance and use the SDK to perform a DynamoDB scan operation and batch write to your Elasticsearch instance.

You could also use Amazon EMR to perform a Map Reduce Job to export to S3 and from there process all your data.