4
votes

We have a .NET client application that uploads files to S3. There is an event notification registered on the bucket which triggers a Lambda to process the file. If we need to do maintenance, then we suspend our processing by removing the event notification and adding it back later when we're ready to resume processing.

To process the backlog of files that have queued up in S3 during the period the event notification was disabled, we write a record to a kinesis stream with the S3 key to each file, and we have an event mapping that lets Lambda consume each kinesis record. This works great for us because it allows us to control our concurrency when we are processing a large backlog by controlling the number of shards in the stream. We were originally using SNS but when we had thousands of files that needed to be reprocessed SNS would keep starting Lambdas until we hit our concurrent executions threshold, which is why we switched to Kinesis.

The problem we're facing right now is that the cost of kinesis is killing us, even though we barely use it. We get 150 - 200 files uploaded per minute, and our lambda takes about 15 seconds to process each one. If we suspend processing for a few hours we end up with thousands of files to process. We could easily reprocess them with a 128 shard stream, however that would cost us $1,400 / month. The current cost for running our Lambda each month is less than $300. It seems terrible that we have to increase our COGS by 400% just to be able to control our concurrency level during a recovery scenario.

I could attempt to keep the stream size small by default and then resize it on the fly before we re-process a large backlog, however resizing a stream from 1 shard up to 128 takes an incredibly long time. If we're trying to recover from an unplanned outage then we can't afford to sit around waiting for the stream to resize before we can use it. So my questions are:

  1. Can anyone recommend an alternative pattern to using kinesis shards for being able to control the upper bound on the number of concurrent lambdas draining a queue?

  2. Is there something I am missing which would allow us to use Kinesis more cost efficiently?

1

1 Answers

0
votes

You can use SQS with Lambda or Worker EC2s.

Here is how it can be achieved (2 approaches):

1. Serverless Approach

  • S3 -> SNS -> SQS -> Lambda Sceduler -> Lambda

  • Use SQS instead of Kinesis for storing S3 Paths

  • Use a Lambda Scheduler to keep polling messages (S3 paths) from SQS

  • Invoke Lambda function from Lambda scheduler for processing files

2. EC2 Approach

  • S3 -> SNS -> SQS -> Beanstalk Worker

  • Use SQS instead of Kinesis for storing S3 Paths

  • Use Beanstalk Worker environment which polls SQS automatically

  • Implement the application (processing logic) in the Beanstalk worker hosted locally on a HTTP server in the same EC2