2
votes

I am trying to query the datastore for the top 100 users in terms of points scored, who have logged on in the past week (the date field).

List<User> users = ofy().load().type(User.class)
   .filter("date >", date).order("date")
   .order("-points").limit(100).list();

It seems to ignore the final ordering by points and returns the list sorted by date instead. If I remove the date filter and sort then I get list nicely sorted by points, but including users who have logged on more than a week ago.

I have read the documentation carefully and it seems to allow queries that include both an inequality filter and multiple sorts.

Any ideas what I am doing wrong?

Here are some relevant notes from the documentation:

Because of the way the App Engine Datastore executes queries, if a query specifies inequality filters on a property and sort orders on other properties, the property used in the inequality filters must be ordered before the other properties.

... if a query specifies one or more inequality filters along with one or more sort orders, the first sort order must refer to the same property named in the inequality filters.

3

3 Answers

4
votes

What you have observed is that standard expected behaviour for index based queries as used by app engine. While filtering, if you apply inequality filter(which can be used only on one property in a query) , then when you have multiple sort orders, the first ordering will be for that property and then further ordering can be based on other properties. For doing a query based on an inequality filter for date and sorting by points, datastore will be using and index like below, where the date property is in ascending or descending order:

day 1 -  100
day 2 -  30
day 2 - 90
day 2- 10
day 3 - 50
day 4 - 40
day 5 - 60 

Now if you do a query with inequality filter of day > day1 , then the query will search the above index and return below results, which will already be sorted in terms of the date, even if you dont mention explicitly:

day 2 -  30
day 2 - 90
day 2- 10
day 3 - 50
day 4 - 40
day 5 - 60

Now if you are doing a query with inequality filter on date and add a sort order on points, then it will be like applying an additional sort on the above result which is already sorted by date. That is why you are forced to explicitly mention date as first sort order( because its already present by default) and then mention points as second sort order. The result will like below. See the sorting done for day 2:

day 2 -  10
day 2 - 30
day 2- 90
day 3 - 50
day 4 - 40
day 5 - 60

So if you want to achieve your logic, you need to retrieve data from app engine and do some additional sorting like below:

1, Fetch with date inequality filter and then in your clientside do a proper sorting based on points to get the top 100.

2, Fetch the top results(around 300) based on a descending order index for points and then filter them based on the date in your clientside to get the desired 100.

2
votes

You may want to consider an alternate approach. This is a lot of index overhead which will cause your costs to be higher, the response time for the handler executing this function to operate an order of magnitude slower, and you will have moments where the eventual consistency of index updates will affect maintenance of this data. If you have a busy site, you will surely not be happy with the latency and costs associated with this approach.

There are a number of alternate approaches. Your expected site transactions per second would affect which you choose. Here is a very simple alternative. Create an ndb entity with a TextProperty. Serialize the top scores entries using a string such as score_userid. Store them in the text field by joining them with a unique character. When a new score comes in, use get_by_id to retrieve this record (ndb automatically handles memcaching for you). Split it into an array. Split the last element of the array, and check against the new score. If it is less than the score, drop it, and append the new score_userid string to the array. Sort the array, join it, and put() the new TextProperty. If you want you could set up an end of the day cron to scan your scores for the day to check to see if your process was affected by the very small chance that two scores arrived at nearly the same time causing one to overwrite the other. HTH. -stevep

2
votes

Thanks tony for taking the time for your great explanations.

A third more complex but "cleaner" option is also possible

(yes because you suppose that in the 300 best results, there will be 100 new, so it matematically is incoherent beyond the fact that is uses some bandwith (if in gwt) and resources).

The third solution is to have a separate table with always at max 100 results sorted by score.

You can have these 100 rows always in memory.

Before saving any new entry check

  • if the 100 memory rows are in the date limit and remove the old rows (supposing you have only one date option: eg. best of month)
  • if the current score to save is better that the lowest in 'up to date' entries, add it to the memory cache.

Now if you have more than one date option: ex: best of day, best of week, best of month Add a column to specify in which category the top 100 element is (and have 300 elements instead of 100, or have a multiple value column and keep 100 entries (with more code complexity))

Best regards,