0
votes

Given below entity in google app engine datastore, is it better to define index on reportingIds or define a separate entity which has only personId and reportingIds fields?
Based on the documentation I understood, defining index results in increase of count of operations against datastore quota.

Below are entities in GAE Go. My code needs to scan through Person entities frequently. It needs to limit its scan to Person entity that has at least 1 reporting person. 2 approaches I see.

  • Define index on reportingIds and Query by specifying filters.
  • Create/Update PersonWithReporters entity when ever a Person gets a new reporting person.
In the second case, my code needs to iterate through all the entities in PersonWithReporters and need not construct any index/query. I can iterate using Key which is always guaranteed to have the latest data. Not sure which approach is beneficial considering datastore operation counts against quota limit.
type Person struct {
    Id string //unique person id
    //many other personal details, his personal settings etc
    reportingIds []string //ids of the Person this guy manages
}

type PersonWithReporters struct {
   Id string //Person managing reportees
   reportingIds []string //ids of the Person this guy manages
}
2

2 Answers

1
votes

A approach with a separate entity gives you two advantages.

  1. As you have already mentioned, you don't need to index/query all Person entities.

  2. Every time a Person gets a new reporting person, you will create a new entity, which may be significantly cheaper than updating a Person entity which has many other properties, some of which, presumably, are indexed.

Your approach with a separate entity is also not ideal. When you index a property with multiple values, under the hood the Datastore creates an index entry for each value. So, when you add reporting person number 3 to this entity, you have to update 3 index entries instead of 1.

You can optimize your data model even further by creating a Reporter entity with no properties! Every time a new reporting person is added, you create this Reporter entity with ID set to the ID of a reporting person, and make it a child entity of a Person entity representing a person to whom this reporter reports.

Now, when you need to iterate through all persons with someone reporting to them, you run a simple query on this Reporter entity - no filters. This query can be set to keys-only (there is nothing than a key in this entity anyway, but keys-only queries are treated differently - they are basically free).

For every entity returned by this query you retrieve its key, and this key contains an ID (which is an ID of a reporting person), and a parent key, which includes an ID of a person who this reporter reports to.

1
votes

Unless AppEngine's datastore in Go is very different to how it works in Java or Python you cannot index an array natively - So option 1 is out of the question, and so is option 2.

I suggest option three, which is to define a

type PersonWithReporters {
    Id string // concatenate(managing_Person_id, separator, reporter_Person_id) to avoid id collisions
    reportingId string; // indexed
    managingId string; // probably indexed as well
}

You would create multiple of these entities instead of a single entity with an array. Also you add an index on reportingId. Now you can create a filter query on this entity and should be able to retrieve the desired information.

I would worry more about performance and not too much about the quota limits, they are pretty high. Just implement it, see how it works and whether quota is your main concern here.