2
votes

Currently in the environment I am building out, a file gets submitted to an S3 bucket from a web form as a JSON file. When that file arrives in the S3 bucket, it triggers AWS Lambda to turn on an EC2 instance. That EC2 needs to download that file in the S3 bucket and feed it into an application. Once it is done, a file gets submitted to an S3 bucket and triggers AWS Lambda, which turns off the EC2 instance.

What is the best way to download that file? These are some ideas I had, but I am unsure if they are possible/practical:

  1. Pass the S3 object ID of the file which triggered AWS Lambda to EC2 (With Lambda)
  2. Pass the S3 object ID of the most recently uploaded file in an S3 bucket to EC2
  3. Get the S3 object ID of the only file in a bucket (Ex: mybucket/uploaded_files/) using AWS CLI

Once the EC2 instance is passed the object ID, it will download it. Thoughts/opinions? Thanks.

Edit for clarification: The file getting uploaded to the S3 bucket is a JSON file containing configuration options for the application. There will only ever be a single EC2 instance (because this is a side project).

2
Have you considered processing the file with a Lambda function? - MCI
@MCI The file is actually just a JSON file with configuration options for the main application running on ec2. The JSON file that gets uploaded to the S3 bucket is what needs to get passed/pulled down to the EC2 instance so the application knows what to do. I will edit my original question to clarify. - HenryHuevo

2 Answers

5
votes

Probably the most scalable and fault-tolerant way of doing is through SQS.

In this scenario your lambda passes object ID to a dedicated SQS queue and launches the instance. The instance upon boot reads messages from the queue and processes the images, which includes downloading them.

1
votes

I wouldn't couple the ec2 start/stop with the creation of each file if the load was high enough. eg, files are being created faster than the startup time of your instance.

I would use a S3 events to send to messages to SNS that SQS is subscribed to. I normally always put SNS in front of SQS as it give you a nice place to filter and fork messages.

I would then have my processing service listening to this SQS queue. I would also place my processing service into auto scaling group. Then you could control the scaling of the group by SQS queue

You could also use a Beanstalk Worker environment to deal with all the e2 scaling and subscribing to the SQS.

Using auto scaling groups it's very possible to end up with a 'faster' app at a lower cost. For example you may find things run faster on 2x t3.large vs 1x t3.xLarge when the cost of a t3.large is half of a t3.xLarge. BUT that greatly depends on your code...

Another idea would just to do all the processing in a lambda. You can pack custom executables into your lambda code. Here's a github project showing another way using claimAV.