How to do table operations in Google BigQuery?

Question

Wanted some advice on how to deal with table operations (rename column) in Google BigQuery.

Currently, I have a wrapper to do this. My tables are partitioned by date. eg: if I have a table name fact, I will have several tables named:

fact_20160301

fact_20160302

fact_20160303... etc

My rename column wrapper generates aliased queries. ie. if I want to change my table schema from ['address', 'name', 'city'] -> ['location', 'firstname', 'town']

I do batch query operation: select address as location, name as firstname, city as town

and do a WRITE_TRUNCATE on the parent tables.

My main issues lies with the fact that BigQuery only supports 50 concurrent jobs. This means, that when I submit my batch request, I can only do around 30 partitions at a time, since I'd like to reserve 20 spots for ETL jobs that are runnings.

Also, I haven't found of a way where you can do a poll_job on a batch operation to see whether or not all jobs in a batch have completed.

If anyone has some tips or tricks, I'd love to hear them.

What is use case behind this select with "rename" and write_truncate? Sounds not very practical. Understanding your case would help to answer your question — Mikhail Berlyant
We are replacing vertica with bigquery. Our team gets requests for renaming columns at times, so we rename columns throughout the board so that it doesn't break any upstream or downstream jobs when they query on that particular column. — Jeff Mao
have you considered using views? it is zero cost vs. executing kind of 'select into' where you pay for whole table scan. still some coding though :) — Mikhail Berlyant
Thanks for the suggestion, but don't think views would help. Btw noticed you were the author of BQ-mate. Great extension :). All of our team who work on BQ have it installed. — Jeff Mao

Mikhail Berlyant Mikhail Berlyant · Accepted Answer · 2016-03-15T19:48:54

I can propose two options

Using View

Views creation is very simple to script out and execute - it is fast and free to compare with cost of scanning whole table with select into approach.
You can create view using Tables: insert API with properly set type property

Using Jobs: insert EXTRACT and then LOAD

Here you can extract table to GCS and then load it back to GBQ with adjusted schema

Above approach will a) eliminate cost cost of querying (scan) tables and b) can help with limitations. But might not depends on the actual volumke of tables and other requirements you might have

How to do table operations in Google BigQuery?

2 Answers