0
votes

I have two models which naturally exist in a parent-child relationship. IDs for the child are unique within the context of a single parent, but not necessarily globally, and whenever I want to query a specific child, I'll have the IDs for both parent and child available.

I can implement this two ways.

  1. Make the datastore key name of each child entity be the string "<parent_id>,<child_id>", and do joins and splits to process the IDs.

  2. Use parent keys.

Option 2 sounds like the obvious winner from a code perspective, but will it hurt performance on writes? If I never use transactions, is there still overhead for concurrent writes to different children of the same parent? Is the datastore smart enough to know that if I do two transactions in the same entity group which can't affect each other, they should both still apply? Or should parent keys be avoided if locking isn't necessary?

3

3 Answers

2
votes

In terms of the datastore itself, parent/child relationships are conceptual only. That is, the actual entities are not joined in any way.

A key consists of a Parent Key, a Kind and Id. This is the only link between them.

Therefore, there isn't any real impact beyond the ability to do things transactionally. Similarly, siblings have no actual relationship, just a conceptual one.

For example, you can put an entity into the datastore referencing a parent which doesn't actually exist. That is entirely legitimate and oftentimes very useful.

So, the only difference between option 1 and option 2 is that with option 1 you have to do more heavy lifting and cannot take advantage of transactions or strongly consistent queries.

Edit: The points above to do not mention the limitation of 1 write per entity group per second. So to directly answer the original question, using parent keys limits throughput if you want to write to many entities sharing the same parent key within a second outside of a single transaction.

1
votes

In general, if you don't need two entities to be updated or read in the same transaction, they should not be in the same entity group, i.e. have similar roots in their key paths, as they would if one were a key-parent of the other. If they're in the same entity group, then concurrent updates to either entity will contend for the entire group, and some updates may need to be retried.

From your question, it sounds like "<parent_id>,<child_id>" is an appropriate key name for the child. If you need to access these IDs separately (such as to get all entities with a particular "<child_id>"), you can store them as indexed properties, and perform queries as needed.