1
votes

Is there an easy way to migrate a large (100G) Google cloud sql database to Google datastore?

The way that comes to mind is to write a python appengine script for each database and table and then put it into the datastore. That sounds tedious but maybe it has to be done?

Side note, the reason I'm leaving cloud sql is because I have jsp pages with multiple queries on them and they are incredibly slow even with a d32 sql instance. I hope that putting it in the datastore will be faster?

There seems to be a ton of questions about moving away from the datastore to cloud sql, but I couldn't find this one.

Thanks

2
Do you have an example that we can use to reproduce the slowness? Note that Datastore is a NoSQL solution so the joins that SQL provides might require writing mapreduce jobs. Datastore should provide better scalability for point SELECTs or UPDATEs that don't conflict.Razvan Musaloiu-E.

2 Answers

2
votes

Here are a few options:

  1. Write an App Engine mapreduce [1] program that pulls data in appropriate chunks from Cloud SQL and write is to Datastore .
  2. Spin up a VM on Google Compute Engine and write a program that fetches the data from Cloud SQL and write to Datastore using the Datastore external API [2].
  3. Use the Datastore restore [3]. I'm not familiar with the format so I don't know how much work is to get produce something that the restore will accept.

[1] https://cloud.google.com/appengine/docs/python/dataprocessing/
[2] https://cloud.google.com/datastore/docs/apis/overview
[3] https://cloud.google.com/appengine/docs/adminconsole/datastoreadmin?csw=1#restoring_data

0
votes

I wrote a couple scripts that do this running on compute engine.

The gcp datastore api
import googledatastore
Here is the code:
https://gist.github.com/nburn42/d8b488da1d2dc53df63f4c4a32b95def

And the dataflow api
from apache_beam.io.gcp.datastore.v1.datastoreio import WriteToDatastore
Here is the code:
https://gist.github.com/nburn42/2c2a06e383aa6b04f84ed31548f1cb09

I get a quota exceeded though after it hits 100,000 entities, and I have to wait another day to do another set.

Hopefully these are useful to someone with a smaller database than me.

( The quota problem is here Move data from Google Cloud-SQL to Cloud Datastore )