1
votes

The documentation states that one split should not be bigger then 'a few GB'.

  • Is there a hard limit on that where Cloud Spanner will stop storing more data in one split ?
  • What is the implication of e.g. splits growing to 20-30GB ?
    • I can think of problems when those splits need to be moved around between instances while being read/written

I know the second point sound like we should split up our primary key/add a sharding-key as first primary-key-part.

But if you have hundreds of customers having really big product catalogs and you need to interleave brand- and category-tables so you can join on them. And alternative approaches of storing one product-catalog in several splits become very slow on secondary index queries (like: query all active products in a catalog).

Thanks a lot in advance because this would help us a lot of understanding Cloud Spanner better for our planned production-use. Christian Gintenreiter

1
So, what you have tried for this? include that in question and postuser9662188
You'll get worse performance if you create large splits. We recommend to architecture your schema in a way to avoid putting more than few GB to a single splitMairbek Khadikov

1 Answers

3
votes

A split can only be served by a single node, so very large splits may cause the single node to become a performance bottleneck. You may start to see performance degradation with a split size greater than 2GB. The hard limit on split size is bound by the the storage limit for a single node, which is 2TB.

Can you please provide some more details about your schema and interleaving?