1
votes

Earlier, I have created an ingestion service with the concept of master and slave lambdas. Master lambda is invoked in response to S3 events and will read csv file from S3 and will create chunks. Later, master lambda will invoke multiple slave lambdas asynchronoulsy. Each slave lambas will process these chunks of data and finally put into DynamoDB.

Here, I am able to invoke multiple instances of slave lambdas and achieve paralleism

Later, I read about Step Function(SF) which orchestrate multiple AWS services to accomplish task. Now, I am thinking to redesign my ingestion service with the SF. With the help of Map state, it is very convenient to achieve paralleism :https://aws.amazon.com/blogs/aws/new-step-functions-support-for-dynamic-parallelism/ But I am not sure how useful it will be since this is new feature and is tightly coupled.

Any suggestion how to achieve this and or any other alternative approach with respect to SF

1

1 Answers

0
votes

first of all, I like your approach of creating different Master/Slave lambdas: a common mistake is to try to make a unique "recursive" function that does both chunking and loading based on the input event. This seems to be ok at first but can lead to infinite loops and expensive bills.

TL;DR I think there is nothing wrong with your approach and I would stick with it.

I add two unrelated thoughts on the subject:

  1. SFs get more values when you need to do something after the load to DynamoDB is done (e.g. if you want to delete all loaded items if a lambda goes wrong, or you want to log the total number of items added) or in the case you want to switch to a serial approach (e.g. to spare some WCU on your Dynamo Table).

  2. AWS Data Pipeline is a tool designed to perform ETL and supports workflow from S3 to DynamoDB. It's worth taking a look at it (in the official documentation you can find a tutorial for ingesting from S3 to DynamoDB).

The way you built the service seems fine to me and I wouldn't waste time changing it if you are happy with it: keep in mind the benefits of different designs but avoid over-optimize in the early stages.