1
votes

I'm attempting syncing the contents of an S3 bucket (actually digital ocean space) to my local hard drive with aws-cli s3 sync or aws-cli s3 cp --recursive

I've tried using both the aws-cli sync and cp commands, but both stop after 1000 objects. I know sync mentions it's limited to 1000 objects with the --page-size flag (https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html), but it seems like everything I've seen talks about syncing entire buckets, but my sync's stop at 1000 objects and I have 7 million to sync.

aws s3 cp s3://MYBUCKET ~/Documents/temp_space --source-region https://sfo2.digitaloceanspaces.com --profile MYPROFILE --endpoint=https://sfo2.digitaloceanspaces.com --recursive

aws sync cp s3://MYBUCKET ~/Documents/temp_space --source-region https://sfo2.digitaloceanspaces.com --profile MYPROFILE --endpoint=https://sfo2.digitaloceanspaces.com

I expect to be able to sync the entire bucket, not just sync 1000 objects.

2
Have you tried to do this, or is your question based on reading the docs only? I just uploaded 52,348 objects using ‘’’aws s3 cp —recursive —quiet . s3://my-bucket’’’ which took about 20 minutes.hephalump
Perhaps this is a limitation of Digital Ocean?John Rotenstein

2 Answers

1
votes

The ‘’’—page-size’’’ parameter limits the number or results in a request, not the total number.

By way of example, in a scenario where you have a directory with 5,000 objects that you wish to copy to a s3 bucket. Your command would look something like aws s3 cp . s3://your-bucket. This will copy all contents of our current directory, 5,000 objects, to our s3 bucket.

The default (and maximum) —page-size is 1,000 objects so, because we haven’t specified a —page-size, in order to accomplish copying all 5,000 objects to your s3 bucket, under the hood, the AWSCLI is going to handle making 5 requests (5 x 1,000 objects) to copy all 5,000 objects.

Generally, you should be able to simply ignore this optional parameter and run your aws s3 cp or aws s3 sync without issue. If you face issues with the request timing out, then you can add the —page-size parameter with a value less than 1,000 to address the time-out issue.

0
votes

In a short, you can use https://github.com/s3tools/s3cmd to replace the aws cli.

The v1 of s3 list-objects API returns 1000 entries at the max and a marker can be used to the next query. But the aws cli s3 sync doesn't support marker for some reason.

https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjects.html

The v2 of s3 list-objects API uses a ContinuationToken to replace the marker. aws cli works with v2 better.

https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html

For DigitalOcean the version 2 list type is not currently supported.

So you have to use s3cmd.