In an App Engine NDB model, do references to related models need to be explicitly cached to minimize query costs and optimize performance?

Question

The App Engine documentation on NDB caching indicates that caching is enabled by default:

NDB automatically caches data that it writes or reads (unless an application configures it not to).

I hope this means that I can rely on it to manage key-related models in a cost-effective and performant way. Here's a simple example involving two models with a one-to-many relationship.

A user model (has many comments):

class User(ndb.Model):
    name                    = ndb.StringProperty(required=True)
    email                   = ndb.StringProperty(required=True)

    def comments(self, limit=25):
        return UserComment.query(UserComment.user_key == self.key) \
                          .order(-UserComment.created_at) \
                          .fetch(limit)

A comment model (each comment belongs to a user):

class UserComment(ndb.Model):
    user_key                = ndb.KeyProperty(required=True)
    text                    = ndb.StringProperty(required=True)
    created_at              = ndb.DateTimeProperty(auto_now_add=True)

    @property
    def user(self):
        return self.user_key.get()

And a template where a comment is displayed and includes two references to comment.user:

<div class="comment">
  <div class="body">
    {{ comment.text }}
  </div>
  <div class="footer">
    by {{ comment.user.name }} ({{ comment.user.email }})
  </div>
</div>

Is this a sane pattern? Will each reference to comment.user.name and comment.user.email incur a separate query cost or can the automatic NDB cache be trusted to avoid or minimize this?

Similarly, with the User.comments method, can automated caching be trusted minimize costs? Or is it advisable to add code that explicitly uses memcache?

Dan Cornilescu Dan Cornilescu · Accepted Answer · 2016-04-09T19:08:25

NDB caching covers entities themselves. From the doc you mentioned:

NDB automatically caches data that it writes or reads (unless an application configures it not to). Reading from cache is faster than reading from the Datastore.

This means that by default you shouldn't need to bother with manual handling of caching for direct key/property lookups like comment.user.name and comment.user.email, ndb will take care of that.

However queries are a different story - there is no way to know if the data returned by a query remains the valid response for the same query repeated at a later time - additional data may have been created in the meantime. Caching query results is the very first memcache use mentioned in the documentation:

One use of a memory cache is to speed up common datastore queries. If many requests make the same query with the same parameters, and changes to the results do not need to appear on the web site right away, the app can cache the results in the memcache. Subsequent requests can check the memcache, and only perform the datastore query if the results are absent or expired. Session data, user preferences, and any other queries performed on most pages of a site are good candidates for caching.

In other words you should attempt to manually cache things like User.comments which rely on queries for their return value.

In an App Engine NDB model, do references to related models need to be explicitly cached to minimize query costs and optimize performance?

2 Answers