0
votes

In my scenario there are 3 main kinds of entities in the datastore, kind Users, Objects and Keywords. The problem is modelling the relationships among objects and keywords classifying them.

Users can create as many objects they want and the User-Object relationship can easily implemented with ascestor links.
I modeled inter-objects and inter-users relationships by defining an Entity for any relationship. This beacause this way I can have all the fan-in and fan-out I want and the relationship search scale with the result set, so it its time-efficient too.

now I need to classify objects for keywords. Each object can be associated to a limited number of keys while each key can not have fan out limits.
How is the most efficient way to implement them? (time-efficiency (complexity,...) and database activity)

A first method can be: assign to each object a list of keys:
the search will scale with the result set, so It will not depend on the number of relationships and key.

Modelling both keys and key-object relationships as entities like for the inter-users an inter-objects case:
the search will again scale with the result set and thus again does not depend on the net size.

What can I consider as a comparison criteria?

1

1 Answers

2
votes

If you put a list of keywords (or their ids) in your Object entity, you will incur additional writing costs: adding each keyword will result in the update of the Object entity, requiring a write per entity and a write per each indexed property, including a write per each keyword.

If this happens rarely, this will be a small extra cost, and I would recommend the list approach for its simplicity. On the other hand, if keywords are added/removed more frequently, the costs will quickly add up.

With the Keyword_Object entity you avoid the extra costs of updating Object entities, but you have to maintain another Entity type, and your stored data will take more space (an extra key per each keyword-object pair).

I recommend going with either of these approaches and optimizing later, when more data is available, unless you are certain to have millions of records very soon, and you already know your data access patterns.