I have a use case where I programmatically bring up an EC2 instance, copy and executable file from S3, run it and shut down the instance (done in user-data). I need to get only the last added file from S3. Is there a way to get the last modified file / object from a S3 bucket using the CLI ?
5 Answers
You can list all the objects in the bucket with aws s3 ls $BUCKET --recursive:
$ aws s3 ls $BUCKET --recursive
2015-05-05 15:36:17 4 an_object.txt
2015-06-08 14:14:44 16322599 some/other/object
2015-04-29 12:09:29 32768 yet-another-object.sh
They're sorted alphabetically by key, but that first column is the last modified time. A quick sort will reorder them by date:
$ aws s3 ls $BUCKET --recursive | sort
2015-04-29 12:09:29 32768 yet-another-object.sh
2015-05-05 15:36:17 4 an_object.txt
2015-06-08 14:14:44 16322599 some/other/object
tail -n 1 selects the last row, and awk '{print $4}' extracts the fourth column (the name of the object).
$ aws s3 ls $BUCKET --recursive | sort | tail -n 1 | awk '{print $4}'
some/other/object
Last but not least, drop that into aws s3 cp to download the object:
$ KEY=`aws s3 ls $BUCKET --recursive | sort | tail -n 1 | awk '{print $4}'`
$ aws s3 cp s3://$BUCKET/$KEY ./latest-object
After a while there is a small update how to do it a bit elegant:
aws s3api list-objects-v2 --bucket "my-awesome-bucket" --query 'sort_by(Contents, &LastModified)[-1].Key' --output=text
Instead of extra reverse function we can get last entry from the list via [-1]
Old answer:
This command just do the job without any external dependencies:
aws s3api list-objects-v2 --bucket "my-awesome-bucket" --query 'reverse(sort_by(Contents, &LastModified))[:1].Key' --output=text
If this is a freshly uploaded file, you can use Lambda to execute a piece of code on the new S3 object.
If you really need to get the most recent one, you can name you files with the date first, sort by name, and take the first object.
Following is bash script, that downloads latest file from a S3 Bucket. I used AWS S3 Synch command instead, so that it would not download the file from S3 if already existing.
--exclude, excludes all the files
--include, includes all the files matching the pattern
#!/usr/bin/env bash
BUCKET="s3://my-s3-bucket-eu-west-1/list/"
FILE_NAME=`aws s3 ls $BUCKET | sort | tail -n 1 | awk '{print $4}'`
TARGET_FILE_PATH=target/datdump/
TARGET_FILE=${TARGET_FILE_PATH}localData.json.gz
echo $FILE_NAME
echo $TARGET_FILE
aws s3 sync $BUCKET $TARGET_FILE_PATH --exclude "*" --include "*$FILE_NAME*"
cp target/datdump/$FILE_NAME $TARGET_FILE
p.s. Thanks @David Murray
Event/Lambdaon the object that gets triggered onObjectCreation. fetching the last object among 2M+ objects using s3 cli or api is way to slower. - Vaulstein