0
votes

I am new with Google App Engine and I am a little bit confused with answers which are related to the connections to a local Datastore.

My ultimate goal is to stream data from a Google Datastore towards a Big Query Dataset, similar to https://blog.papercut.com/google-cloud-dataflow-data-migration/. I have a copy of this DataStore locally, accessible when I run a local App Engine, i.e. I can access it through an admin console when I use $[GOOGLE_SDK_PATH]/dev_appserver.py --datastore_path=./datastore.

I would like to know if it is possible to connect to this datastore using services outside of the App Engine Instance, with python google-cloud-datastore or even Apache Beam ReadFromDatastore method. If not, should I use the Datastore Emulator with the App Engine Datastore generated file ?

If anyone has an idea on how to proceed, I would be more than grateful to know how to do.

1

1 Answers

2
votes

If it is possible it would have to be through the Datastore Emulator, which is capable to also serve apps other than App Engine. But it ultimately depends on the implementation of the libraries you intend to use - if the underlying access methods are capable of understanding the DATASTORE_EMULATOR_HOST environment variable pointing to a running datastore emulator and use that instead of the real Datastore. I guess you'll just have to give it a try.

But be aware that the local storage dir internal format used by the Datastore Emulator may be different than that used by the development server, so make a backup of your .datastore dir before trying stuff, just in case. From Local data format conversion:

Currently, the local Datastore emulator stores data in sqlite3 while the Cloud Datastore Emulator stores data as Java objects.

When dev_appserver is launched with legacy sqlite3 data, the data will be converted to Java objects. The original data is backed up with the filename {original-data-filename}.sqlitestub.