Optimizing hierarchical data sets for reads of whole hierarchies

Question

I am migrating an app from Oracle to Google Spanner. One of the cases we came across are relationships between rows in the same table.

These relationships have a tree-like structure, always having one parent and one root of the hierarchy altogether. Bottom up and top to bottom query patterns are possible.

There will be cases where we'd like to have efficient access to the whole record-tree. This data access pattern is latency critical.

The application previously used Oracle and their hierarchical queries (connect by) and was highly optimized for that vendor.

The number of rows in one tree-fetch would range between 1-2000. Table will have millions of sych rows.

Rows of that table do have interleaved child table rows within.

Would it make much sense to optimize the table for better data locality by denormalizing the model and redundantly adding the root record's id as the first column of the primary key of that table for faster top-down queries?

It would go like this:
root_id | own_id | parent_id 1 | 1 | 1 1 | 2 | 1 1 | 3 | 2 4 | 4 | 4 4 | 5 | 4 4 | 5 | 4

Ie. we are considering to make PK consist of (root_id, own_id) here. (values are superficial, we can spread them out in real scenario).

What is the chance for such rows, containing same first element of the PK to go to the same split? Would there be actual benefit to do so?

Rose Liu Rose Liu · Accepted Answer · 2018-03-06T01:35:52

Cloud Spanner supports parent-child table relationships to declare a data locality relationship between two logically independent tables, and physically co-locate their rows for efficient retrieval. Please see this link for more information: https://cloud.google.com/spanner/docs/schema-and-data-model#parent-child_table_relationships

For example, assuming we have a table 'Root' with primary key 'root_id', we can declare the table 'Own' to be a child of the 'Root' table. The primary key of the parent table becomes a prefix to the primary key of the child table. So table 'Own' could have a primary key of (root_id, own_id). All rows of table 'Own' having the same 'root_id' would be located in the same split.

Splits do have a max size limit. As a rule of thumb, the size of every set of related rows in a hierarchy of parent-child tables should be less than a few GiB.

Optimizing hierarchical data sets for reads of whole hierarchies

1 Answers