I need to perform an gsutil rsync from a google cloud storage bucket to a local directory, which might be interrupted/fail due to a poor connection. So I tested what would happen if I just try to rsync again and continue where I left off, and it gives an error tring to remove a .gstmp file that was left by the first interrupted rsync.
Let's say I have a bucket with these files:
test1.txt
test2.txt
test3.txt
And I run this gsutil rsync command:
user@machine:~/$ gsutil rsync -C -d -r gs://bucket_name ~/tmp/
Which I interrupt during the copying of test2.txt. This will leave a test2.txt_.gstmp in the target directory. Now when I do the same rsync again, this happens:
user@machine:~/$ gsutil rsync -C -d -r gs://bucket_name ~/tmp/
Building synchronization state...
Starting synchronization...
Copying gs://bucket_name/test3.txt...
Removing file:///home/user/tmp/test2.txt_.gstmp
OSError: No such file or directory.
So it picks up where it was interrupted the last time, but also flags the .gstmp file for removal, which is great. But when it actually tries to remove it, it's somehow already gone and I get the OSError (like it tries to remove it twice). Now if I run the same command again, everything works fine, because the .gstmp file is not there anymore.
Does anyone have any idea what could cause this, and how to avoid it?
EDIT:
It looks like it's happening because gsutil is cleaning up .gstmp files regardless, so if the .gstmp file is also part of the syncronization state that's being built, it tries to delete it twice (first as part of the cleanup, and then again as part of the syncronization) which causes the OSError. My current fix is to add an ignore regex to rsync command:
gsutil rsync -C -d -r -x ".*gstmp$" gs://bucket_name ~/tmp/
Now it ignores the .gstmp in the rsync process, but still deletes it as part of the cleanup