1
votes

I want to setup Google Cloud Storage as my data lake and I'm using Pubsub + dataflow to save interactions into it. Dataflow creates a new file each 5 min to store it in a GCS folder. This will eventually lead to a lot of files inside the given folder. Is there any limit on the number of files that can be saved inside a GCS folder?

2

2 Answers

5
votes

There is no practical limit. Bear in mind there are not even really "folders" in Cloud Storage. There are just objects with paths whose names look like folders, for the purpose of helping you organize and navigate all that content.

0
votes

The limit is 5.2 pentillion, which would take many years to even create

We store some of our services as zero-compute JSON files with sub-folders in GCP buckets. I wanted to confirm we could store more than 4.2 billion folders in a bucket so we could access our files via ID just like we would in a database (currently we are up to over 100k files per folder - we basically use GCP buckets as a type of database that has a read:write ratio well-beyond 1m:1).

I asked our engineering team to open a ticket and confirm our usage was practical, and that passing 4.2 billion items was possible. Google Cloud support confirmed there are customers using Cloud Storage today that go well-beyond the 4.2 billion (32 bit) limit, into the trillions, and that the main index currently involves a 64 bit pointer, which may be the only limit.

64 bit is 5.2 pentillion, or 9,223,372,036,854,775,807 to be exact.

They do have other, related-limits like 1k writes/5k reads per second per bucket, which can auto-scale but has nuances, so if you think you may hit that limit, you may want to read about it here: https://cloud.google.com/storage/docs/request-rate.

For reference, here is there general storage quotas and limits: https://cloud.google.com/storage/quotas

...it does not describe the 64-bit / 5.2 pentillion item limitation, possibly because that limit would practically be impossible to reach, as it would take about a decade just to create the objects, after which time it would be 2032 and they would probably have engineered beyond 64-bit :)