12
votes

I am trying to copy between two S3 buckets in different regions using the Command Line Interface on an EC2 server.

region info:
EC2 instance: us-west-2
S3 origin: us-east-1
S3 destination: us-west-2

The following commands work perfectly from the EC2 server:
aws s3 cp s3://n-virginia/origin s3://n-virginia/destination --recursive --source-region us-east-1 --region us-east-1 --profile my_profile

aws s3 cp s3://oregon/origin s3://oregon/destination --recursive --source-region us-west-2 --region us-west-2 --profile my_profile

I need to run the following command from the EC2 server:
aws s3 cp s3://n-virginia/origin s3://oregon/destination --recursive --source-region us-east-1 --region us-west-2 --profile my_profile

If I run that command from a local machine it works, but if I run it from the EC2 server that I used for the previous two commands I get the following error:

Error:"A client error (AccessDenied) occurred when calling the CopyObject operation: VPC endpoints do not support cross-region requests"

I am able to copy the files from the origin bucket to the EC2 server, and then copy from the EC2 server to the destination bucket, but this is not an acceptable solution in production. I don't understand why it will work on a local machine but not on the EC2 server ("my_profile" is identical on both machines)

2
You are using a VPC endpoint for S3 on the EC2 server. I assume this is because it is in a private subnet of the VPC. So you are accessing S3 via a different method on the EC2 server than you are on your local computer. That's why it behaves differently. If the VPC endpoint doesn't support what you are trying to do, then there really is no work-around besides copying all the files to the EC2 server first. - Mark B
Please tell me if I understand this correctly. The first command uses a VPC endpoint to connect to s single bucket and it works. The second command uses a different endpoint to connect to a different single bucket and it works. The third command uses one of the previous vpc endpoints trying to connect to two different buckets and fails connecting to one of them? - Lazer
It has nothing to do with the ability to "connect to a bucket". The first command copies files from one S3 bucket to another, where both buckets are in the same region. The second command copies files from one S3 bucket to another, where both buckets are in the same region. The third command copies files form one S3 bucket to another, where the buckets reside in different regions. The error message is saying that specific scenario is not supported by VPC endpoints. - Mark B
Enable cross-region replication and let amazon run it for you. - Frederic Henri
Looks like the solution is to turn off VPC Endpoints, I dont know all the details, I think it might be easiers, if this is a one time operation, to make an new VPC without endpoints and do the operation there. - ThorSummoner

2 Answers

6
votes

As pointed out in the comments the problem is your VPC has an endpoint and cross region copies are not supported.

To fix that, either temporarily disable the VPC endpoint, by updating your VPC route table, or just create a new VPC without a VPC endpoint and launch an EC2 there.

Cross region replication would be ideal, but as pointed out, that only effects new items in the bucket

Instead of using aws s3 cp you probably want to use aws s3 sync. Sync will only copy changed files, thus allowing you to rerun it again in case it is interrupted. For example:

aws s3 sync s3://n-virginia/origin s3://oregon/destination

Note also that both cp and sync do NOT preserve ACL. So if you have changed ACL permission on individual files they will all be set to the default after the copy. There are some other tools that are supposed to preserve ACL the like https://s3tools.org which seems to work for me.

1
votes

If downloading the entire bucket locally is not feasible due to disk space required, you can download, upload and remove 5 seconds worth of files.

The first line of shell snippet below starts a background download of the entire source bucket to the local disk. While there are files in the current directory, call aws s3 mv which will copy files to the destination bucket and remove them locally.

mkdir tempdir
aws s3 sync s3://source-bucket . &
sleep 5
while [ $(ls | wc -l) -gt 0 ] ; do mv *.txt tempdir ; aws s3 mv --recursive tempdir/* s3://destination-bucket ; done

The aws s3 sync command creates temporary files, with random extension, while writing files to disk. The aws s3 mv command will unfortunately sometimes upload these files. To avoid this, move a batch of the files, e.g. all .txt files, to a temporary directory and upload only them.

In practice I see no more than 50M of disk used locally ( less than 500 files where each file is less than 100k)