7
votes

I'm having some trouble with the google app engine datastore. Ever since the new pricing model was introduced, the cost of running my app has increased massively.

The culprit appears to be "Datastore small operations", which come in at more than 20 Million ops per day!

Has anyone had this problem, I don't think I'm doing an excessive amount of key lookups, and I only have 5000 users, with roughly 10 - 20 requests per minute.

Thanks in advance!

Edit

Ok got some stats, these are after abut 3 hours. Here is what I am seeing in my dashboard, in the billing section: Appengine dashboard - billing

And here are some of the stats:

Stats

Obviously there are quite a lot of calls to datastore.get. I am starting to think that it is my design that is causing the problem. Those gets correspond to accounts. Every user has an account, but an account can be one of two types, for this I use composition. So each account entity has a link to its sub account entity. As a result when I do a search for nearby users it involves fetching the accounts using the query, and then doing a get on each account to get its sub account. The top request in the stats picture is a call that gets 100 accounts, and then has to do a get on each one. I would have thought that this was a very light query, but I guess not. And I am still confused by the number of datastore small ops being recorded in my dashboard.

4
Out of curiosity, what was your typical monthly bill before and after?Dave
My daily quota was $2, and I never hit that. Now it is $5 dollars and I am exceeding it every day. I think I would have to increase to $9 a day.Theblacknight
Sorry, I also should have asked this, but are you using memcache at all?Dave
No, I haven't really looked into memcache. I would have thought the datastore could handle the current amount of data for a much more reasonable price. Having said that, it's not a site I'm running, it's the backend for an app, a game, so it is quite heavy on processing.Theblacknight
"The top request in the stats picture is a call that gets 100 accounts, and then has to do a get on each one." You should be fetching all 100 keys in one batch rather than doing individual gets. See here for an explanation of the pattern. Also, you should definitely be keeping frequently accessed entities in memcache to reduce datastore lookups.Drew Sears

4 Answers

11
votes

Definitely use appstats as Drew suggests; regardless of what library you're using, it will tell you what operations your handlers are doing. The most likely culprits are keys-only queries and count operations.

9
votes

My advice would be to use AppStats (Python / Java) to profile your traffic and figure out which handler is generating the most datastore ops. If you post the code here we can potentially suggest optimizations.

1
votes

Don't scan your datastore, use get(key) or get_by_id(id) or get_by_key_name(keyname) as much as you can.

1
votes

Do you have lots of ReferenceProperty properties in your models? Accessing them will trigger db.get for each property unless you prefetch them. This would trigger 101 db.get requests.

class Foo(db.Model):
   user = db.ReferenceProperty(User)

foos = Foo.all().fetch(100)
for f in foos:
  print f.user.name  # this triggers db.get(parent=f, key=f.user)