I’m not completely confident in my understanding of indexes.
(comfort level: 87%)
Does parenting affect the index location on BigTable tablets. Or is it purely determined by kind?
My example:
Normally, it would be a bad idea to index a datetime property if the entity kind is subject to frequent writes.
But if the parent is a fairly uniformly distributed random key, and it is unlikely that there will be 2 entities of kind Proposed() that have the same parent, would I still have an issue with a monotonic increasing index values creating hotspots?
(I'm using App Engine Standard, Python 2.7.)
‘'' #...I have an entity kind like this:
class Proposed(ndb.Model):
foo = ndb.StringProperty(indexed=True, default=None)
bar = ndb.IntegerProperty(indexed=True, default=0)
date = ndb.DateTimeProperty(indexed=True, auto_now_add=True)
#… create a randomly distributed key
random_id = int(random.uniform(0, 9999999999999999))
parent_key = ndb.Key(‘Papa', random_id)
#…I parent the entity to the random key
p = Proposed(parent=parent_key)
p.foo = ‘a ball of string’
p.bar = 42
p.put()
#…and I query using inequality filter
q = Proposed.query(ndb.AND(Proposed.bar == 42,
Proposed.date >= start_date,
Proposed.date < end_date))
‘''
Documents that seemed to indicate this (ancestor) solution:
https://cloud.google.com/appengine/articles/indexselection
Describes Index hierarchy.
https://cloud.google.com/appengine/docs/standard/python/datastore/indexes#index-definition-structure
"The rows of an index table are sorted first by ancestor and then by property values, in the order specified in the index definition."
https://cloud.google.com/datastore/docs/best-practices#high_readwrite_rates_to_a_narrow_key_range
"Avoid high read or write rates to Cloud Datastore keys that are lexicographically close. Cloud Datastore is built on top of Google's NoSQL database, Bigtable, and is subject to Bigtable's performance characteristics. Bigtable scales by sharding rows onto separate tablets, and these rows are lexicographically ordered by key"
Alternate solutions:
1) Create a datetime string property with a random hash prepended.
2) Make the datetime string with the order reversed: millisecond:second:minute:hour day:month:year
I see how those solutions might work querying with an equality filter, however I will be using an inequality filter on date, and I don’t see how to query a range of dates with that method.
Humble Thanks!