6
votes

App Engine Datastore cannot be queried for an aggregate result.

Example: I have an entity called "Post" with the following fields:

Key id, String nickname, String postText, int score

I have many different nicknames and many posts by each nickname in my datastore.

If I want a leader board of the top ten nicknames of total scores, I would typically have sql as follows:

select nickname, sum(score) as sumscore
from Post 
group by nickname 
order by sumscore 
limit 10

This type of query is not possible in google app engine datastore java api (jdo or jpa).

What are alternative strategies that I could use to achieve a similar result?

Crudely and brutely, I could load every Post entity and compute the aggregation fully in my application code. This is obviously not efficient on large datasets.

What other strategies can I employ?

1
If Google only updated their plugin DataNucleus would do that "crude and brute force" approach for you transparently. The code to do it would only be a handful of lines to their plugin, contributed a year ago ...DataNucleus
@DataNucleus Having built-in support wouldn't make it any faster or more efficient.Nick Johnson
@Nick, of course it wouldn't, but the whole user-experience would be a damn sight more pleasant, and the amount of code people have to write would be less - that is the business we're inDataNucleus
@DataNucleus The point is that if you're trying to do this sort of aggregation at read time, you're probably doing something you shouldn't be. It's rare that doing aggregate queries at read time is the right option unless you're doing OLAP type work, which App Engine isn't built for.Nick Johnson
While that may or not be the case what DataNucleus says is actually good advice since that provides a full implementation of a standard. Means that people have portable code. With the Google take on this standard you can have little bits of it, but then have to rewrite your code for their platform. I for one want to be able to just swap my code to run on MySQL, or GAE and not have to put in all sorts of hacks and am prepared to accept the occasional inefficient query for the benefit of that portabilityuser383680

1 Answers

10
votes

Create a Nickname model, and each time you add a new Post, retrieve the corresponding Nickname and increase a stored score sum there. Essentially, do the computation at insert/update-time, not query-time.