What are the trade-offs in Cloud Datastore for list property vs multiple properties vs ancestor key?

Question

My application has models such as the following:

class Employee:
  name = attr.ib(str)
  department = attr.ib(int)
  organization_unit = attr.ib(int)
  pay_class = attr.ib(int)
  cost_center = attr.ib(int)

It works okay, but I'd like to refactor my application to more of a microkernel (plugin) pattern, where there is a core Employee model that just might just have the name, and plugins can add other properties. I imagine perhaps one possible solution might be:

class Employee:
  name = attr.ib(str)
  labels = attr.ib(list)

An employee might look like this:

Employee(
   name='John Doe'
   labels=['department:123',
           'organization_unit:456',
           'pay_class:789',
           'cost_center:012']
)

Perhaps another solution would be to just create an entity for each "label" with the core employee as the ancestor key. One concern with this solution is that currently writes to an entity group are limited to 1 per second, although that limitation will go away (hopefully soon) once Google upgrades existing Datastores to the new "Cloud Firestore in Datastore mode":

https://cloud.google.com/datastore/docs/firestore-or-datastore#in_native_mode

I suppose an application-level trade-off between the list property and ancestor keys approaches is that the list approach more tightly couples plugins with the core, whereas the ancestor key has a somewhat more decoupled data scheme (though not entirely).

Are there any other trade-offs I should be concerned with, performance or otherwise?

Siva Siva · Accepted Answer · 2019-09-12T16:57:48

Personally I would go with multiple properties for many reasons but it's possible to mix all of these solutions for varying degree of flexibility as required by the app. The main trade-offs are

a) You can't do joins in data store, so storing related data in multiple entities will prevent querying with complex where clauses (ancestor key approach) b) You can't do range queries if you make numeric and date fields as labels (list property approach) c) The index could be large and expensive if you index your labels field and only a small set of the labels actually need to be indexed

So, one way to think of mixing all these 3 is

a) For your static data and application logic, use multiple properties. b) For dynamic data that is not going to be used for querying, you can use a list of labels. c) For a pluggable data that a plugin needs to query on but doesn't need to join with the static data, you can create another entity that again uses a) and b) so the plugin stores all related data together.

What are the trade-offs in Cloud Datastore for list property vs multiple properties vs ancestor key?

1 Answers