I have setup an SQS queue where S3 paths are being pushed whenever there is a file upload.
I have also set up a Lambda with an SQS trigger and a batch size of 1.
In my scenario, I have to process n
files at a time. Lets say (n = 10
).
Say, there are 100 messages in the queue. In my current implementation I'm doing the following steps:
- Whenever there is a message in the input queue, Lambda will be triggered
- First I check the active number of concurrent executions I have. If am already running 10 executions, the code will simply return without doing anything. If it is less than 10, it reads one message from the queue and calls for processing.
- Once the processing is done, the message will be manually deleted from the queue.
With the above mentioned approach, I'm able to process n
files at a time. However, Say 100 files lands into S3 at the same time.
It leads to 100 lambda calls. Since we have a condition check in Lambda, the first 10 messages go for processing and the remaining 90 messages go to the in-flight mode.
Now, when some of my processing is done (say 3/10 got over), still the main queue is empty since the messages are still in-flight.
As per my understanding, if processing a file takes x minutes, the visibility timeout of the messages in the queue should be lesser than x (<x) . So that the message would once be available in the queue.
But it also leads to another problem. Say the batch took some more time to complete, message would come back to queue. Lambda would be triggered and once again it goes to the flight mode.
Is there any way, I can control the number of triggers made in lambda. For example: only first 10 messages should be processed however remaining 90 messages should remain visible in the queue. Or is there any other way I can make this design simple ?
I don't want to wait until 10 messages. Even if there are only 5 messages, it should trigger those files. And I don't want to call the Lambda in timely fashion (ex: calling it every 5 minutes).