0
votes

New to AWS and found out that with AWS-SDK I can't get multiple objects of S3 at one request. I could loop the get request, but that would take a long time with a single function. I heard that Lambda can run multiple functions at once and that SQS could help me with that.

So how would you set up a Lambda and SQS system that sums all digits found in all files of a S3 bucket?

In example, if I have 6000 files in a bucket, a first lambda will count them, then send a message to SQS with the number of files, then SQS will trigger a lambda that will run until just before it times out, pass the sum of digits found in the files it read with a message to SQS that will trigger the next lambda passing it the sum and the last index it read, and so on until all files are read and summed - the last lambda will return the total sum

Maybe better - the first lambda will fire several parallel lambdas that will each upon completion add to a sum somewhere, and in the end the sum will be returned to me. If this sounds logical

1
So you want to write a Lamdba function using a specific programming language, retrieve all objects in a given Amazon S3 bucket, read the content, then add up the values, and place the values in an AWS SQS queue? - smac2020
It's unclear what you want to do with SQS. - Mark B
Yes, I want to use Node, to retrieve all objects in a given Amazon S3 bucket, read the content, then add up the values. But I want to make sure that the process runs as fast as it can. So I thought maybe SQS might help; maybe by chunking the objects in S3 and sending a lambda function for each chunk... - Oren Sayag
Classic XY Problem. How many of these files are there? What size are they? How many numbers do they contain, and in what format? How often will you run this process? How fast does a given run need to be? - jarmod
maybe by chunking the objects in S3 The basic idea is not bad, but queuing will make the processing asynchronous (not necessarily faster). And at the end you need to aggregate the results which creates additional complexity with your proposed approach. This task (summing over all files) is a good example for the map-reduce approach - gusto2

1 Answers

1
votes

I heard that Lambda can run multiple functions at once

Lambda can run multiple instances at once, but something need to execute the functions and aggregate the results

and that SQS could help me with that.

SQS can help with a lot of things, but in this case I don't see any reasonable usage

So how would you set up a... system that sums all digits found in all files of a S3 bucket?

If you have a LOT of data and you want to process them in parallel, by default you could use an ERM / Spark cluster. To keep it simple, assuming you have the S3 data in reasonable (supported) format, I'd personally use AWS Athena which is basically a serverless analytics service.