3
votes

I'm using gsutil rsync, copying from s3 to gs, and I'm getting the following error after gsutil has gone partway through a bucket:

Caught non-retryable exception while listing s3://[bucket]/: BadRequestException: 400 None CommandException: Caught non-retryable exception - aborting rsync

This is undesirable behavior, because I can manually copy from s3 to gs other files. I can't bypass by using the "-C" switch, since this isn't an error in copying.

Edit: Appears that if a "#" is in a filename in s3, gsutil replaces it with "?versionId=". For example:

S3 filename: Updaet#2_Montgomery Building Permits.xlsx

GS lists in debug output as: Updaet?versionId=2_Montgomery Building Permits.xlsx

2

2 Answers

2
votes

can you please provide more details about this failure by running:

gsutil -D rsync your-source your-destination

and then excerpting the HTTP request/response that shows the error? When you do please redact the authorization: header.

If you'd prefer not to post the details of your request on the public forum you can email them to me at gs-team@google.com

Thanks.

0
votes

This same thing happened to me yesterday, and the '#' is indeed the problem.

The issue appears to be in boto, not necessarily gsutil, though I don't know exactly where the fix is. BotoTranslation._StorageUriForObject() calls boto.storage_uri() which uses VERSION_RE ('(?P<versionless_uri_str>.+)#(?P<version_id>.+)$') to find a version in the uri_str/path. If the object name contains a '#', everything after it will therefore get treated as an S3 version ID. I don't see that there is currently any way to escape or encode the '#' so that it doesn't get treated as a version separator.