1
votes

I have a project that regularly needs to upload millions of tiny (1 - 3 KB) image files to Google Cloud Storage. What's the recommended method/library to do this? I'm currently using gsutil but wonder if there's a better library. I recently came across google-cloud but it seems to be even slower (using blob.upload_from_filename()).

I'd like to be able to do this via Python (windows) but am open to other options if they provide significant performance advantages.

Any suggestions?

1

1 Answers

1
votes

A lot of optimization has gone into gsutil, I doubt using the raw libraries is going to be faster without considerable effort (though it may be more tuned for transferring large files than large numbers of files).

Try adding the -m flag to your cp command to multi-thread your upload. https://cloud.google.com/storage/docs/gsutil/commands/cp

After that the only thing you can probably do is parallelize across multiple machines (each machine copies a subset of the files).