3
votes

From NDB Caching:

NDB manages caches for you. There are two caching levels: an in-context cache and a gateway to App Engine's standard caching service, memcache. Both caches are enabled by default for all entity types, but can be configured to suit advanced needs.

My app doesn't make any ndb caching configuration change, so it must be using the defaults - both caching levels enabled.

I'm running some tests on my staging environment (a separate, dedicated GAE project) where I can totally isolate sequences of activity from any spurious external requests.

Each sequence of activity consists of a cascade of push tasks triggering each-other, creating some hundreds of volatile entities, modifying some of them a variable number of times, reading all of them a variable number of times and finally deleting them all.

Only a handful of other already existing entities are accessed during the sequence, all of them persisting after the sequence ends. The number of accesses to the persistent entities should be significantly lower than the number of accesses to the volatile ones.

The vast majority of read operations are entity lookups by key or ids, obtained from keys_only queries or from other related entities.

I'm not using entity ancestry. Most of these tasks perform cross-group transactions and I do see quite often transaction failures/retries due to data contention on a few "hot" entities (I'd estimate some 200-400 of them in this run, hard to count in the Stackdriver log page).

After executing one such ~20 minutes sequence freshly after the daily quota reset, the app's dashboard shows 3x more Cloud Datastore Read Operations (0.03 Millions) than Cloud Datastore Entity Writes (0.01 Millions). The number of volatile entities is indicated by the Cloud Datastore Entity Deletes (0.00089 Millions), if that matters.

The memcache hit rate was 81%, but I don't know if that is for my app's explicit memcache use only or if it includes the ndb's memcache use.

Some earlier similar measurements but not in as clean environment produced similar results, I did this clean one as a verification.

These observations appear to suggest that the entity reads from the cache still count as datastore reads. But here I'm assuming that:

  • there weren't many memcache evictions during those ~20 min (the app doesn't have a dedicated memcache bucket).
  • each write op invalidates the cache thus requiring a datastore read to update the cache, but subsequent read ops should come from the cache (until the next write op), which should lead to rather comparable overall read and write counts if cached reads aren't counted
  • my (fairly complex) app code really does what I'm describing :)

I didn't find anything about this in the docs, so wondering if someone knows if the cached ndb reads indeed count as datastore reads or can point me to flaws in my interpretation or some official documentation on the subject.

2

2 Answers

3
votes

The memcache hit rate includes the hits for entities written by ndb. It does not include the number of in-context cache hits by ndb.

Each write up to datastore invalidates the caches, so the next read will not be cached in memcache or in the in-context cache.

One other thing is that ndb creates a new in-context cache for each transaction, so the in-context cache isn't very effective in the face of transactions.

The quick answer is that a memcache hit for a datastore entity is not charged as a datastore read for billing purposes.

1
votes

Following Jim's answer I did further digging, sharing my findings here as others may find them useful.

I congratulated myself for deciding from day one to build and use a generic wrapper class for all my datastore entities, further inherited by specific classes for each entity kind. That is instead of operating with model classes directly inheriting ndb.Model.

I already had a db_data @property in this generic class for reading the entity from the datastore on-demand, it was easy to plug in my own entity memcache-based caching scheme and tracking code to determine if the reads come from memcache or from ndb and if they're done inside transactions or not. This is how it looks like:

@property
def db_data(self):
    if not hasattr(self, self.attr_db_data):
        db_data = None
        if self.readonly or not ndb.in_transaction():
            # noinspection PyTypeChecker
            db_data = memcache.get(self.memcache_key)
        if db_data:
            if not isinstance(db_data, self.db_model):
                logging.error('mismatched cached db_data kind, expected %s got type %s %s' %
                              (self.kind, type(db_data), db_data))
                db_data = None
            else:
                if self.trace:
                    logging.debug('%s from memcache' % self.lid)
                self.cached = True
        if not db_data:
            if hasattr(self, self.attr_db_key):
                db_data = self.db_key.get()
            elif hasattr(self, self.attr_key_id):
                db_data = self.db_model.get_by_id(id=self.key_id)
            else:
                raise RuntimeError('missing db_data, db_key and key_id')
            if db_data:
                # use ndb model names as strings in the list below. TODO: don't leave them on!
                show_trace = self.kind in ['']
                if self.trace or show_trace:
                    if show_trace:
                        self.print_stack(4)
                    logging.debug('%s from datastore' % self.lid)
                self.cached = False
                # noinspection PyTypeChecker
                memcache.set(self.memcache_key, db_data)
        setattr(self, self.attr_db_data, db_data)
    return getattr(self, self.attr_db_data)

With this in place I uncovered:

  • a lot of unnecessary reads from ndb, most notably the sequence configuration entities which did not change at all during the entire sequence and were accessed hundreds of times
  • some reads for the same entities repeated inside the same request
  • most important: the details for my many transactions failing with TransactionFailedError(The transaction could not be committed. Please try again.) or TransactionFailedError: too much contention on these datastore entities. please try again, caused by all these unnecessary reads from many transactions executed in parallel. I knew (or rather suspected) the root cause - see related Contention problems in Google App Engine - but with these details info I can now actually work towards reducing them.

For the write side, I added to the generic class a db_data_put() method (with many checks and tracking support) and replaced all the .db_data.put() calls scattered all over the app code with it. This is how it looks like:

def db_data_put(self, force=False):
    #assert ndb.in_transaction()  TODO: re-enable after clearing all violations

    self.db_key = self.db_data.put()

    show_trace = False
    if not ndb.in_transaction():
        logging.error('%s: db_data_put called outside transaction, readonly=%s' % (self.lid, self.readonly))
        show_trace = True
    if self.readonly:
        logging.error('%s: db_data_put called with readonly set' % self.lid)
        show_trace = True
    if force:
        if self.trace:
            logging.warn('%s: db_data_put called with force arg set' % self.lid)
    if self.cached:
        logging.error('%s: db_data_put called with cached data' % self.lid)
        show_trace = True
    if self.put_count:
        logging.warn('%s: db_data_put already called %d time(s)' % (self.lid, self.put_count))
        show_trace = True
    self.put_count += 1

    if  self.update_needed:
        self.update_needed = False
    elif not force:
        if self.trace:
            logging.warn('%s: db_data_put called without update_needed set' % self.lid)
            show_trace = True

    # noinspection PyTypeChecker
    memcache.set(self.memcache_key, self.db_data)

    # use ndb model names as strings in the list below. TODO: don't leave them on!
    show_kind = self.kind in ['']

    if show_trace or show_kind:
        self.print_stack(4)

    if self.trace or show_trace or show_kind:
        logging.debug('%s: db_data_put %s' % (self.lid, self.kind))

With it I uncovered some room for improvements and some lurking bugs:

  • entities written outside transactions (room for data corruption)
  • entities written multiple times in response to the same request (thus violating the max 1 write/second/entity group datastore limit)
  • several hot requests occasionally clustering very close to each-other, thus contributing to the contention problem and, when writing the same entities, also violating the 1 write/sec limit.

After setting the self.readonly flag to all sequence config objects and a couple of more often unnecessarily read ones (to enable caching even inside transactions), serializing the hot writing requests and fixed the most critical bugs found I re-tried the clean measurement test:

  • the entire sequence execution time dropped to ~13 minutes, probably from serializing the hot transactions which helped reduce the "noise" of the sequence and the overall number of logic state transitions and push queue tasks
  • the number of data contention-related transaction failures dropped by ~60% - I attribute most of this to my private caching of entities not expected to change during a transaction (ndb has no clue if an entity accessed in a transaction will eventually be written to or not) and to serializing of the hot transactions
  • the Cloud Datastore Entity Writes dropped to 0.0055 Million Ops - probably due to the fewer sequence state transitions mentioned above
  • the Cloud Datastore Read Operations dropped to 0.01 Million Ops - my caching helps during transactions

The rate of write to reads actually increased, invalidating my assumption that they should be comparable - the fact that most of the sequence activity happens in transactions matters in this case. I guess my assumption would only be valid in a non-transactional context.