1
votes

I have a directory named bar in google cloud storage bucket foo. There are around 1 million small files (each around 1-2 kb) in directory bar.

According to this reference if I have a large number files I should use gsutil -m option to download the files, like this:

gsutil -m cp -r gs://foo/bar/ /home/username/local_dir

But given the number of total files (around 10^6), the whole process of downloading the files is still slow.

Is there a way so that I can compress the whole directory in cloud storage and then download the compressed directory to the local folder?

1

1 Answers

2
votes

There's no way to compress the directory in the cloud before copying, but you could speed up the copy by distributing the processing across multiple machines. For example, have scripts so

machine1 does gsutil -m cp -r gs://<bucket>/a* local_dir

machine2 does gsutil -m cp -r gs://<bucket>/b* local_dir etc.

Depending on how your files are named you may need to adjust the above, but hopefully you get the idea.