1
votes

How do you delete multiple S3 files with Last Modified date condition?

I have this folder structure on s3.

  • dentca-lab-dev-sample
    • 2019-03-13
      • file1 Last modified: Mar 13, 2019 2:34:06 PM GMT-0700
      • file2 Last modified: Mar 13, 2019 3:18:01 PM GMT-0700
      • file3 Last modified: Mar 13, 2019 2:34:30 PM GMT-0700
      • file4 Last modified: Mar 13, 2019 2:32:40 PM GMT-0700

and wanted to delete a file (this is just a sample) less than Mar 13, 2019 2:34:30 PM

and so I made this bash script but its not working.

aws s3 ls --recursive s3://dentca-lab-dev-sample/2019-03-13/ | awk '$1 <= "2019-03-13 14:34:30" {print $4}'

** ls is just for testing. will change it to rm

I also have this script for testing

aws s3 ls --recursive s3://dentca-lab-dev-sample/2019-03-13/

output:

2019-03-13 14:34:06   11656584 2019-03-13/mandibular.stl
2019-03-13 15:18:01   11969184 2019-03-13/maxillary.stl
2019-03-13 14:34:30    9169657 2019-03-13/obj.obj
2019-03-13 14:32:40   15690284 2019-03-13/upperAIO_50005.stl

but when I do the awk to make condition doesn't work. Maybe because $1 only catches this arugment 2019-03-13 and im compering it to 2019-03-13 14:34:30

also tried doing this. awk '$1 $2 <= "2019-03-13 14:34:30" {print $4}' to catch the second argument but still got nothing. Its my first to make a bash btw.

Thank you! I have this as reference btw. aws cli s3 bucket remove object with date condition

1

1 Answers

3
votes

You can use this to obtain a list of objects with a LastModified before a given date:

aws s3api list-objects --bucket my-bucket --query "Contents[?LastModified<='2019-03-13'].[Key]" --output text

Note that it uses s3api rather than s3, which has access to more information.

You could then take the results and pump them into aws s3 rm to delete the objects.

Frankly, if you wish to get fine-grained like this, I would recommend using Python instead of bash. It would be something like:

import boto3

s3 = boto3.client('s3', region_name='ap-southeast-2')
response = s3.list_objects_v2(Bucket='my-bucket')

keys_to_delete = [{'Key': object['Key']} for object in response['Contents'] if object['LastModified'] < datetime(2019, 3, 13)]
s3.delete_objects(Bucket='my-bucket', Delete={'Objects': keys_to_delete})