I am migrating an app from Oracle to Google Spanner. One of the cases we came across are relationships between rows in the same table.
These relationships have a tree-like structure, always having one parent and one root of the hierarchy altogether. Bottom up and top to bottom query patterns are possible.
There will be cases where we'd like to have efficient access to the whole record-tree. This data access pattern is latency critical.
The application previously used Oracle and their hierarchical queries (connect by
) and was highly optimized for that vendor.
The number of rows in one tree-fetch would range between 1-2000. Table will have millions of sych rows.
Rows of that table do have interleaved child table rows within.
Would it make much sense to optimize the table for better data locality by denormalizing the model and redundantly adding the root record's id as the first column of the primary key of that table for faster top-down queries?
It would go like this: root_id | own_id | parent_id
1 | 1 | 1
1 | 2 | 1
1 | 3 | 2
4 | 4 | 4
4 | 5 | 4
4 | 5 | 4
Ie. we are considering to make PK consist of (root_id, own_id) here. (values are superficial, we can spread them out in real scenario).
What is the chance for such rows, containing same first element of the PK to go to the same split? Would there be actual benefit to do so?