4
votes

We are trying to query App Engine datastore for some general stats using NDB. They don't need to be 100% accurate (i.e., I'm not concerned about eventual consistency); they just need to reflect generally the count of entities.

With NDB, we are issuing simply something like:

query = MyModel.query(MyModel.source==source, MyModel.created<=some_time).order(-MyModel.created)
count = query.count(keys_only=True)

This is timing out after around 60s. We use entity groups and transactions fairly regularly, but I'm hoping that those would not impact these count queries. We currently have around 4.2M entities of MyModel, though the source filter would be limiting this down to perhaps 210,000.

Is there an alternate way to count numbers of this magnitude, without a bunch of custom memcache-y logic? Remember, the numbers don't need to be exact, just "generally correct".

1
Typo: of course there should have been a MyModel.query().filter()... in there... - Jason A. Collins
You can just edit the post. Also you can specify your filters as kwargs to query to achieve the same effect. Why do you need order if you are just counting? - bossylobster
You're right - the order is irrelevant - except I use it to target a custom index that already exists in my index.yaml and don't have to create a new one. - Jason A. Collins
Very nice. I'm not sure if this has any impact on the count query though. It'd be nice to find out; I'll see what I can do. - bossylobster

1 Answers

4
votes

I believe the limit, previously 1000, has now been removed. So the practical limit is then how many can be counted before the timeout.

Some similar questions have been asked and normally Sharded Counters are brought up at this point.

But I think you will probably be better off running your counter either as a task (time out goes to 10 minutes) or on a backend (no timeout at all).

As Guido notes in the comments queries cannot take longer than 60 seconds.

EDIT: 1000 limitation was removed some time ago, 1.3.6: Release Notes:

The Datastore no longer enforces a 1000 entity limit on for count and offset. Queries using these will now safely execute until they return or your application reaches the request timeout limit.