0
votes

I am using GAE python (webapp2) framework for a project and have a question related to ndb query. Let me give an example to explain this clearly:

Here is a ndb model I have:

class Example(ndb.Model):
  userid = ndb.StringProperty()
  field1 = ndb.StringProperty()
  field2 = ndb.StringProperty()

Here are the entities I have created using the above model:

[Example(key=Key('Example', '1~A~B'), field1=u'B', field2=u'A', userid=u'1'),
 Example(key=Key('Example', '1~C~D'), field1=u'D', field2=u'C', userid=u'1'),
 Example(key=Key('Example', '2~A~B'), field1=u'B', field2=u'A', userid=u'2'),
 Example(key=Key('Example', '2~C~D'), field1=u'D', field2=u'C', userid=u'2'),
 Example(key=Key('Example', '3~A~B'), field1=u'B', field2=u'A', userid=u'3'),
 Example(key=Key('Example', '3~X~Y'), field1=u'Y', field2=u'X', userid=u'3'),
 Example(key=Key('Example', '4~C~D'), field1=u'D', field2=u'C', userid=u'4'),
 Example(key=Key('Example', '4~E~F'), field1=u'F', field2=u'E', userid=u'4'),
 Example(key=Key('Example', '5~A~B'), field1=u'B', field2=u'A', userid=u'5'),
 Example(key=Key('Example', '5~X~Y'), field1=u'Y', field2=u'X', userid=u'5')
]

Given the above data I want to find the following: - Find all userids which have both of the following conditions met:

field1='B', field2='A'
and field1='D', field2='C'

In the above example the following userids will match this criteria:

userid='1' 
userid='2'

So the question I have is, Is it possible for us to find the above result using a single ndb query? If so how?

The method I know right now to do this can only be achieved with a OR ndb query where we find all records matching

 Example.query(ndb.OR(ndb.AND(field1='B' AND field2='a'), ndb.AND(field1='D' AND field2='C')))

Then iterate through the results (in python) and then have a logic to extract only those userids which have field1='B', field2='A' and field1='D', field2='C'

This doesn't seem to be an efficient way because of the following reason: - We are unnecessarily extracting more than needed records with the OR combination. - If we limit the results with fetch_page, there is no way to know how many records will be filtered out using our criteria. It could even result in empty records based on our limit.

OR if there is a way for me to first find all userids with field1='B' and field2='A' and then directly query within this result to find all userids with field1='D' and field2='C' that will also solve the problem. So this is querying over a query result.

Appreciate your inputs / suggestions. Thanks.

1

1 Answers

1
votes

Yes, the ndb.OR operation is not very efficient and can cause combinatorial explosion, see the discussion in Combining AND and OR Operations.

Also keep in mind that you're querying for Example elements, not for userids. Which means you can't really query for field1='D' and field2='C' within the results for the field1='B' and field2='A' query - the result will always be empty because field1 cannot simultaneously be both 'D' and 'B'.

So you really need to do separate 2 queries, extract the sets of userids in each of the results and then check for the intersection of the 2 sets.

It might be possible to use distinct projection queries (experimental!) to get the list of users more efficiently:

query_1 =  Example.query(ndb.AND(Example.field1='B' AND Example.field2='A'),
                         projection=[Example.userid], distinct=True))
query_2 =  Example.query(ndb.AND(Example.field1='D' AND Example.field2='C'),
                         projection=[Example.userid], distinct=True))

users_1 = set([example.userid for example in query_1.fetch()])
users_2 = set([example.userid for example in query_2.fetch()])

users_list = list(users_1 & users_2)

Note: I didn't actually try the above code, it's based only on the documentation.