6
votes

People, the company where I work has some MySQL databases on AWS (Amazon RDS). We are making a POC with BigQuery and what I am researching now is how to replicate the bases to BigQuery (the existing registers and the new ones in the future). My doubts are:

  • How to replicate the MySQL tables and rows to BigQuery. Is there any tool to do that (I am reading about Amazon Database Migration Service)? Should I replicate to Google Cloud SQL and than export to BigQuery?
  • How to replicate the future registers? Is possible to create a job inside MySQL to send the new registers after a predefined number? For example, after 1,000 new rows are inserted (or a time is passed), some event is "triggered" and the new registers are copied to Cloud SQL/BigQuery?

My initial idea is to dump the original base, load it to the other and use a script to listen to new registers and send them to the new base.

Have I explained it properly? Is it understandable?

2
I use xplenty to basically mirror tables from mysql AWS RDS to BQ. Xplenty can drop and recreate the tables. You have to pay for it but is very quick and easy. Could be useful to sign up for trail to even use it for poc. That's kinda what i did and am still using it as let's me focus more on the more useful stiff like how I'm using the data in BQ. Might be a quick solution if it's literally just you doing this and need to be quick.andrewm4894

2 Answers

2
votes

You will need to use one of the ETL tools which have integration with both mySQL and BigQuery to perform initial transfer of the data and copy subsequent changes to BigQuery. Take a look on the list of available tools [1]

You can also implement your own tool by developing a process which will extract the data from mySQL to a CSV file and then load that file into BigQuery using data import [2]

[1] https://cloud.google.com/bigquery/third-party-tools

[2] https://cloud.google.com/bigquery/loading-data-into-bigquery

2
votes

In addition to what Vadim said, you can try:

  • mysqldump to CSV files to s3 (I believe RDS allows that)
  • run "gsutil" Google Cloud Storage utility to copy data from s3 to GCS
  • run "bq load file.csv" to load the file to BigQuery

I'm interested in hearing your experience, so feel free to ping me in private.