I'm writing a GAE-based application that should allow users to filter items by several of their properties. Items are stored as NDB entities. Some of the props can be matched by standard query filters but some require a "full" (substring) text search for the whole thing to make sense. Additonally, some sensible ordering is required. It's perhaps best illustrated with a following contrived example:
class Product(ndb.Model) :
manufacturer = ndb.StringProperty()
model = ndb.StringProperty()
rating = ndb.IntegerProperty(choices = [1, 2, 3, 4])
features = ndb.StringProperty(repeated = True, choices = ['feature_1', 'feature_2'])
is_very_expensive = ndb.BooleanProperty()
categories = ndb.KeyProperty(kind = Category, repeated = True)
Product entities all have the same ancestor as their "container". A product can belong to one or more categories and the latter form a tree.
Now, users should be able to:
- Narrow down products by selecting a category (single one will suffice)
- Filter them by specifying a minimal rating and desired features
- View exclusively products that are very expensive or those that are not (or view all)
- Search for products by a piece of text from model and/or manufacturer fields
- Have the final list ordered eg. by model name (ability to pick ordering would be ideal though).
All this at the same time, ie. filters and ordering should be seamlessly applied when search terms are provided.
The question is: how to achieve such functionality in a performat way using GAE?
There are going to be hundreds of thousands, or perhaps millions, of products in the database. The problem with Search API, when used together with NDB queries, is filtering the search results and perhaps ordering them.
Two solutions i've been thinking of:
Add a repeated
StringPropertyto theProductmodel that would contain all searchable substrings (or at least prefixes) of words frommanufacturerandmodelfields. It's easy and it works but i'm seriously concerned about performance. In my experiments i got on average 40-50 searchable word prefixes for each "Product".Use Search API exclusively for the task, utilizing advanced search queries. Eg. i can store product's categories (as IDs or paths) in a separate document field and use this field to obtain products belonging to a given category. It probably can be done but what concerns me here is the limit of 10,000 search results and various usage limitations/quotas. I'm also not sure about ordering of results.
Are there any other ways?