3
votes

I'm trying to implement a complete backup/restore function for my Google appengine/datastore solution. I'm using the recommended https://cloud.google.com/datastore/docs/export-import-entities for periodic backup and for restore. One thing I cannot wrap my head around on how to do is how to restore to an empty datastore? The import function won't clear the datastore before importing so I have to implement a total wipe of the datastore myself. (And, a way to clear the datastore might be a good thing also for test purposes etc.)

The datastore admin is not an option since it's being phased out.

The recommended way, according to the google documentation, is to use the bulk delete: https://cloud.google.com/dataflow/docs/templates/provided-templates#cloud-datastore-bulk-delete. The problem with this method is that I will have to launch 1 dataflow job for each namespace/kind combination. And I have a multi-tenant solution with one namespace per tenant and around 20 kinds per namespace. Thus, if I have e.g. 100 tenants, that would give 2000 dataflow jobs to wipe the datastore. But the default quota is 25 simultaneous jobs... Yes, I can contact Google to get a higher quota, but the difference in numbers suggests that I'm doing it wrong.

So, any suggestions on how to wipe my entire datastore? I'm hoping for a scalable solution (that won't exceed request timeout limits etc) where I don't have to write hundreds of lines of code...

2

2 Answers

1
votes

One possibility is to create a simple 1st generation python 2.7 GAE application (or just a service) in that project and use the ndb library (typically more efficient than the generic datastore APIs) to implement an on-demand selective/total datastore wiping as desired, along the lines described in How to delete all the entries from google datastore?

0
votes

This solution deletes all entries in all namespaces. By using ndb.metadata, no model classes are needed. And by using ndb.delete_multi_async it will be able to handle a reasonably large datastore before hitting a request time limit.

from google.appengine.api import namespace_manager
from google.appengine.ext import ndb

...

    def clearDb():
        for namespace in ndb.metadata.get_namespaces():
            namespace_manager.set_namespace(namespace)
            for kind in ndb.metadata.get_kinds():
                keys = [k for k in ndb.Query(kind=kind).iter(keys_only=True)]
                ndb.delete_multi_async(keys)

The solution is a combination of the answers:

Refer to the latter for tips on how to improve it as time limits are hit and how to avoid instance explosion.