1
votes

I need to pull data published to an S3 bucket by a different organization (therefore a different AWS account) in a different region, for subsequent processing with Lambda. I do have access to read it but cannot ask them to set up replication to my buckets.

Amazon's Cross-Region Replication looks like it's designed for pushing data from the source and I'm not even sure the source organization has versioning enabled.

Is there a way to pull data? My need is for one-way only; I need to process that data shortly (within 10 minutes or so) after it arrives in the source S3 bucket.

1
A cron-job that runs aws s3 sync every 10 minutes? Something like that is going to be the best way to pull from an S3 bucket I think, if you can't get new object events sent to you from that bucket. - Mark B
Is there a way to run this as a lambda? I'm thinking of the cost of running an EC2 instance just to run the sync. Thanks. - wishihadabettername

1 Answers

1
votes

You could run aws s3 sync on a schedule, like every 10 minutes. If you want to run this in a AWS Lambda function, it looks like NodeJS and Python Lambda environments have the AWS CLI tool pre-installed. I would suggest writing a short Python Lambda function that calls the AWS CLI took to run an s3 sync command, and schedule that Lambda function to run every 10 minutes.