1
votes

I am writing a scheduled federated query to load my BiqQuery tables on a daily basis. BigQuery table load strategy is Overwrite. My source is a Cloud SQL database (mysql instance).

I am wondering what would be the correct approach from a performance and cost-optimization perspective in the long run to load my BigQuery tables? Should I overwrite my BigQuery tables daily with source data or should I build a logic in my federated query itself using joins to detect just the new additions in source and then add them to my BigQuery table during daily scheduled runs?

1

1 Answers

2
votes

Your second idea is the way to go.

I build a logic in my federated query itself using joins to detect just the new additions in source and then add them to my BigQuery table

The less amount of data BigQuery needs to read/write the cheaper it will be.

This is an approach generally referred to as incremental