Copy files incrementally from S3 to EBS storage using filters

Question

I wish to move a large set of files from an AWS S3 bucket in one AWS account (source), having systematic filenames following this pattern:

my_file_0_0_0.csv
...
my_file_0_7_200.csv

Into a S3 bucket in another AWS account (target). These need to be moved by an ec2 instance (to overcome IAM access restrictions) to an attached EBS volume incrementally (to overcome storage limitations).

Clarification:

in the filenames, there are 3 numbers separated by underscores, like so: _a_b_c, where a is always 0, b starts at 0 and goes up to 7, and c goes from 0 to maximally 200 (not guaranteed it will always reach 200).

(I have a SSH session to the EC2 instance through Putty).

1.st iteration:

So what I am trying to do in the first iteration is to copy all files from S3, that have a name with the following pattern: my_file_0_0_*.csv. This can be done with the command:

aws s3 cp s3://my_source_bucket_name/my_folder/ . --recursive --exclude "*" --include "my_file_0_0_*" --profile source_user

From here, I upload it to my target bucket with the command:

aws s3 cp . s3://my_target_bucket_name/my_folder/ --recursive --profile source_user

And finally delete the files from the ec2 instance's ebs volume with rm *.

2.nd iteration:

aws s3 cp s3://my_source_bucket_name/my_folder/ . --recursive --exclude "*" --include "my_file_0_1_*" --profile source_user

This time, I only get some of the files with pattern my_file_0_1_*, as their combined file sizes reaches 100 GiB which is the limit of my ebs volume. Here I run into the issue that the filenames are sorted alphabetically and not numerically by the digits in there names. e.g.:

my_file_0_1_0.csv
my_file_0_1_1.csv
my_file_0_1_10.csv
my_file_0_1_100.csv
my_file_0_1_101.csv
my_file_0_1_102.csv
my_file_0_1_103.csv
my_file_0_1_104.csv
my_file_0_1_105.csv
my_file_0_1_106.csv
my_file_0_1_107.csv
my_file_0_1_108.csv
my_file_0_1_109.csv
my_file_0_1_11.csv

After moving them to the target S3 bucket and removing them from ebs, the challenge is to move the remaining files with pattern my_file_0_1_* in a systematic way. Is there a way to achieve this, e.g. by using find, grep, awk or similar ? And do I need to cast some filename-slices to integers first ?

BarathVutukuri BarathVutukuri · Accepted Answer · 2021-06-15T12:15:51

You can use sort -V command to consider the proper versioning of files and then invoke copy command on each file one by one or a list of files at a time.

ls | sort -V

If you're on a GNU system, you can also use ls -v. This won't work in MacOS.

Copy files incrementally from S3 to EBS storage using filters

Clarification:

1.st iteration:

2.nd iteration:

1 Answers