1
votes

Now that Google has announced availability of Cloud SQL storage for app engine, what will be the best way to migrate existing data from BigTable/GAE Datastore to MySQL?

To respond to the excellent questions brought up by Peter:

  • In my particular scenario, I kept my data model very relational.
  • I am fine with taking my site down for a few hours to make the transition, or at least warning people that any changes they make for the next few hours will be lost due to database maintenance, etc.
  • My data set is not very large - the main dashboard for my app says .67gb, and the datastore statistics page says it's more like 200mb.
  • I am using python.
  • I am not using the blobstore (although I think that is a separate question from a pure datastore migration - one could migrate datastore usage to MySql while maintaining the blobstore).
  • I would be fine with paying a reasonable amount (say, less than $100).
  • I believe my application is Master/Slave - it was created during the preview period of App Engine. I can't seem to find an easy way to verify that though.

It seems like the bulk uploader should be able to be used to download the data into a text format that could then be loaded with mysqlimport, but I don't have any experience with either technology. Also, it appears that Cloud SQL only supports importing mysqldumps, so I would have to install MqSQL locally, mysqlimport the data, then dump it, then import the dump?

An example of my current model code, in case it's required:

class OilPatternCategory(db.Model):
    version = db.IntegerProperty(default=1)
    user = db.UserProperty()
    name = db.StringProperty(required=True)
    default = db.BooleanProperty(default=False)

class OilPattern(db.Model):
    version = db.IntegerProperty(default=2)
    user = db.UserProperty()
    name = db.StringProperty(required=True)
    length = db.IntegerProperty()
    description = db.TextProperty()
    sport = db.BooleanProperty(default=False)
    default = db.BooleanProperty(default=False)
    retired = db.BooleanProperty(default=False)
    category = db.CategoryProperty()

class League(db.Model):
    version = db.IntegerProperty(default=1)
    user = db.UserProperty(required=True)
    name = db.StringProperty(required=True)
    center = db.ReferenceProperty(Center)
    pattern = db.ReferenceProperty(OilPattern)
    public = db.BooleanProperty(default=True)
    notes = db.TextProperty()

class Tournament(db.Model):
    version = db.IntegerProperty(default=1)
    user = db.UserProperty(required=True)
    name = db.StringProperty(required=True)
    center = db.ReferenceProperty(Center)
    pattern = db.ReferenceProperty(OilPattern)
    public = db.BooleanProperty(default=True)
    notes = db.TextProperty()

class Series(db.Model):
    version = db.IntegerProperty(default=3)
    created = db.DateTimeProperty(auto_now_add=True)
    user = db.UserProperty(required=True)
    date = db.DateProperty()
    name = db.StringProperty()
    center = db.ReferenceProperty(Center)
    pattern = db.ReferenceProperty(OilPattern)
    league = db.ReferenceProperty(League)
    tournament = db.ReferenceProperty(Tournament)
    public = db.BooleanProperty(default=True)
    notes = db.TextProperty()
    allow_comments = db.BooleanProperty(default=True)
    complete = db.BooleanProperty(default=False)
    score = db.IntegerProperty(default=0)

class Game(db.Model):
    version = db.IntegerProperty(default=5)
    user = db.UserProperty(required=True)
    series = db.ReferenceProperty(Series)
    score = db.IntegerProperty()
    game_number = db.IntegerProperty()
    pair = db.StringProperty()
    notes = db.TextProperty()
    entry_mode = db.StringProperty(choices=entry_modes, default=default_entry_mode)
1
this is a pretty wide open question, and possibly a bit premature. For example, all these factors might affect the answer - how much data do you have? do you need to keep your app live while migrating? Do you need to transform the data to make it fit a relational model rather than a entity model? are you using python or java? are you using the blobstore? are you willing to pay to do it quickly or do you want to throttle it so it happens within your free quota? have you already switched to the HRD?Peter Recore
Do you even have access to the limited preview? (if not, you might want to hold off until people have actually used the service, as you'll get better information then.)Peter Recore
@Peter that's a useful checklistsystempuntoout
@Peter I provided more specifics for my situation. I signed up for the limited preview - I haven't been granted access, but at some point I will want to do this.Chris Marasti-Georg
@PeterRecore I now have access to the limited preview.Chris Marasti-Georg

1 Answers

1
votes

Have you considered using the Map Reduce framework? You could write mappers that store the datastore entities in CloudSQL. Do not forget to add a column for the datastore key, this might help you avoiding duplicate rows or identifying missing rows.

You might have a look at https://github.com/hudora/gaetk_replication for an inspiration on the mapper functions.