2
votes

I am trying to build a process that invokes AWS lambda, which then utilizes AWS SNS to send messages that trigger more lambdas. Each such triggered lambdas write an output file to S3. The process is as depicted below -

enter image description here

My question is this - How can I know that all lambdas are done with writing files? I want to execute another process that collects all these files and does merging. I could think of two obvious ways -

  1. Constantly monitor s3 for as many output files as SNS messages. Once, total count reaches, invoke the final merging lambda.
  2. Use a db as sync source, write counts for that particular job/session and keep monitoring it till the count reaches SNS messages count.

Both solutions require constant polling, which i would like to avoid. I want to do this in an event driven manner. I was hoping for Amazon SQS would come to my rescue with some sort of "empty queue lambda trigger", but SQS only supports lambdas triggering on new messages. Is there any known way to achieve this in an event driven manner in AWS? Your suggestions/comments/answers are much appreciated.

4
AWS Step Functions. - jarmod
any other alternatives? - Jay
If you know the number of things in advance, you could initialize a counter in DynamoDB and then atomically decrement it as work completes. Use DynamoDB Streams to trigger Lambda invocation when the counter is mutated, and trigger your next phase (or end of work) when the counter hits zero. I have not personally implemented this but it may be worth investigating. - jarmod
@jarmod the trigger on DynamoDB Streams seems to be available on creation of new records. There seems to be no way to trigger on the event when the counter hits zero. I think the best solution would have been SQS queue purge event, but unfortunately, SQS doesn't give any such event triggers!. - Jay
Whenever an application creates, updates, or deletes items in the table, DynamoDB Streams writes a stream record. - jarmod

4 Answers

3
votes

I would propose a couple of options here:

Step Functions:

This is a managed service for state machines. It's great for co-ordinating workflows.

Atomic Counting:

If you know the number of things in advance, you could initialize an Atomic Counter in DynamoDB and then atomically decrement it as work completes. Use DynamoDB Streams to trigger Lambda invocation when the counter is mutated, and trigger your next phase (or end of work) when the counter hits zero. Note that whenever an application creates, updates, or deletes items in the table, DynamoDB Streams writes a stream record, so every mutation of the counter would trigger your Lambda.

Note that DynamoDB Streams guarantees the following:

  • Each stream record appears exactly once in the stream.

  • For each item that is modified in a DynamoDB table, the stream records appear in the same sequence as the actual modifications to the item.

2
votes

AWS Step Functions (a managed state machine service) would be the obvious choice. AWS has some examples as starting points. I remember one being a looping state that you could probably apply to this use case.

Another idea off top of my head...

Create an "Orchestration Lambda" that has the list of your files...

  1. Orchestration Lambda invokes a "File Writer Lambda" in a loop, passing the file info. The invokeAsync(InvokeRequest request) returns a Future object. Orchestration Lambda can check the future object state for completion.

  2. Orchestration Lambda can make a similar call to the "File Writer Lambda" but instead use the more flexible method: invokeAsync(InvokeRequest request, AsyncHandler asyncHandler). You can make an inner class that implements this AsyncHandler and monitor the completion there in the Orchestration Lambda. It is a little cleaner than all the loops.

There are probably many ways to solve this problem, but there are two ideas.

1
votes

Personally, I prefer the idea with "Step Functions".

But if you want to simplify your architecture, you could create trigered lambda function. Chose 'S3 trigger' in left side of lambda function designer and configure it bottom. enter image description here

Check out more - Using AWS Lambda with Amazon S3

But in this case you have to create more sophisticated lambda function wich will check that all apropriate files are uploaded on S3 and after this start your merge.

1
votes

The stated problem seems a suitable candidate for the Saga Pattern. Basically Saga is described like any long running , distributed process.

As mentioned earlier , the AWS platform allows using Step functions to implement a Saga, as described here enter