4
votes

I need to perform an gsutil rsync from a google cloud storage bucket to a local directory, which might be interrupted/fail due to a poor connection. So I tested what would happen if I just try to rsync again and continue where I left off, and it gives an error tring to remove a .gstmp file that was left by the first interrupted rsync.

Let's say I have a bucket with these files:

test1.txt
test2.txt
test3.txt

And I run this gsutil rsync command:

user@machine:~/$ gsutil rsync -C -d -r gs://bucket_name ~/tmp/

Which I interrupt during the copying of test2.txt. This will leave a test2.txt_.gstmp in the target directory. Now when I do the same rsync again, this happens:

user@machine:~/$ gsutil rsync -C -d -r gs://bucket_name ~/tmp/
Building synchronization state...
Starting synchronization...
Copying gs://bucket_name/test3.txt...
Removing file:///home/user/tmp/test2.txt_.gstmp
OSError: No such file or directory.

So it picks up where it was interrupted the last time, but also flags the .gstmp file for removal, which is great. But when it actually tries to remove it, it's somehow already gone and I get the OSError (like it tries to remove it twice). Now if I run the same command again, everything works fine, because the .gstmp file is not there anymore.

Does anyone have any idea what could cause this, and how to avoid it?

EDIT:

It looks like it's happening because gsutil is cleaning up .gstmp files regardless, so if the .gstmp file is also part of the syncronization state that's being built, it tries to delete it twice (first as part of the cleanup, and then again as part of the syncronization) which causes the OSError. My current fix is to add an ignore regex to rsync command:

gsutil rsync -C -d -r -x ".*gstmp$" gs://bucket_name ~/tmp/

Now it ignores the .gstmp in the rsync process, but still deletes it as part of the cleanup

1
Which version of the Cloud SDK are you running this on? - rsalinas
I'm running gsutil version: 4.47 - Bart

1 Answers

0
votes

I tried to reproduce your use case:

 gsutil rsync -C -d -r gs://syncbucket  temp/
 #Building synchronization state...
 #Starting synchronization...
 #Copying gs://syncbucket/test1.txt...
 #Copying gs://syncbucket/test2.txt...
 #Copying gs://syncbucket/test3.txt...
 #CCaught CTRL-C (signal 2) - exiting

 ls temp/
 #test1.txt  test2.txt  test3.txt_.gstmp

 gsutil rsync -C -d -r gs://syncbucket  temp/
 #Building synchronization state...
 #Starting synchronization...
 #Copying gs://syncbucket/test3.txt...
 #Removing file://temp/test3.txt_.gstmp
 #OSError: No such file or directory.

 ls temp/
 #test1.txt  test2.txt  test3.txt

I am not sure what the OSError message means, but the command run successfully and I can see all my files from GCS locally. I do not need to run the gsutil rsync three times.