2
votes

I am new to BigQuery and GCP. I am working with a (big) public data set available in BigQuery on which I am running a SQL query - it selects a bunch of data from one of the tables in the dataset, based on a simple where clause.

I then proceed to perform additional operations on the obtained data. I only need to run this query once a month, the other operations need to be run more often (hourly).

My problem is that every time I do this, it causes BigQuery to process 4+ million rows of data, and the cost of running this query is quickly adding up for me.

  • Is there a way I can run the SQL query and export the data to another table/database in GCP, and then run my operations on that exported data?
  • Am I correct in assuming (and I could be wrong here) that once I export data to standard SQL DB in GCP, the cost per query will be less in that exported database than it is in BigQuery?

Thanks!

1
Have you taken a look at the documentation? It doesn't per say tell you how to minimize costs but it does provide some strategies as to how to control and preview them. And in regards to your last questions, yes, costs will be lower if you materialize query results in stages by writing them into a destination table.Maxim
Someone just asked a very similar question - but they gave way more specifics. Take a look: stackoverflow.com/questions/59462279/…Felipe Hoffa
thank you both for your suggestions. for now, i am saving the results to a separate table in my own dataset and working off that. the query estimator says i am not churning through too many bytes, so hopefully this helps reduce costs. i will eventually move this data to bigtable and consolidate my queries to minimize costs even further.Quest Monger

1 Answers

3
votes

Is there a way I can run the SQL query and export the data to another table/database in GCP, and then run my operations on that exported data?

You can run your SQL queries and therefore export the data into another table/databases in GCP by using the Client Libraries for BigQuery. You can also refer to this documentation about how to export table data using BigQuery.

As for the most efficient way to do it, I will proceed by using both BigQuery and Cloud SQL (for the other table/database) APIs.

The BigQuery documentation has an API example for extracting a BigQuery table to your Cloud Storage Bucket.

Once the data is in Cloud Storage, you can use the Cloud SQL Admin API to import the data into your desired database/table. I attached documentation regarding the best practices on how to import/export data within Cloud SQL.

Once the data is exported you can delete the residual files from your Cloud Storage Bucket, using the console, or interacting with the Cloud Storage. API

Am I correct in assuming (and I could be wrong here) that once I export data to standard SQL DB in GCP, the cost per query will be less in that exported database than it is in BigQuery?

As for the prices, you will find here how to estimate storage and query costs within BigQuery. As for other databases like Cloud SQL, here you will find more information about the Cloud SQL pricing.

Nonetheless, as Maxim point out, you can refer to both the best practices within BigQuery in order to maximize efficiency and therefore minimizing cost, and also the best practices for using Cloud SQL.

Both can greatly help you minimize cost and be more efficient in your queries or imports.

I hope this helps.