I want to rsync a bucket with 100M files between s3 and gs. I've got a c3.8xlarge instance and did a quick dry run:
$ time gsutil -m rsync -r -n s3://s3-bucket/ gs://gs-bucket/
Building synchronization state...
At source listing 10000...
^C
real 4m11.946s
user 0m0.560s
sys 0m0.268s
About 4 minutes for 10k files. At this rate, it's going to take 27 days just to compute the sync state. Anything I can do to speed this up?
I also noticed [and fixed] the following warning: WARNING: gsutil rsync uses hashes when modification time is not available at both the source and destination. Your crcmod installation isn't using the module's C extension, so checksumming will run very slowly. If this is your first rsync since updating gsutil, this rsync can take significantly longer than usual. For help installing the extension, please see "gsutil help crcmod".
Are the file hashes computed or am I just waiting for listing 100M files?