Transforming to new “Partitioned Tables” from existing system

Question

I have plenty of tables sharded by date. I am exploring the way to move on and use the new "Partitioned Tables" instead. I have found this reference for converting tables: https://cloud.google.com/bigquery/docs/creating-partitioned-tables#converting_dated_tables_into_a_partitioned_table

However, as I have many processes relating to existing tables (loading data or Querying), I am looking for a phased transition approach. Meaning: I would like to change the load process and the table definition at 1st, and afterwards, slowly change all relating processes.

How should I go about this transition? (We have considered and ruled out the option to duplicate everything and maintain each process twice due to obvious reasons.)
Is there a way to use the old table names convention (MyTable_YYYYMMDD) when querying a partition of the new transitioned table?
Can I use the older Table wildcard functions to query it until I transform all my query templates?
Are there existing partitioned tables on a public dataset where we can experiment?

Thanks

Pavan Edara Pavan Edara · Accepted Answer · 2016-06-06T17:59:41

This is not exactly quite what you want, but may serve the purpose. With standard SQL, BigQuery supports the equivalent of table wildcard functions:

Documentation for this feature is going to be available soon, but to summarize:

We provide a pseudo column _TABLE_SUFFIX that provides a way to address the table. For example, the following query allows you to read from tables myproject.mydataset.20150105, and myproject.mydataset.20150106 (assuming value1 is a field in both).

SELECT _TABLE_SUFFIX AS table_name, SUM(value1) FROM myproject.mydataset.* WHERE _TABLE_SUFFIX = '20150105' OR _TABLE_SUFFIX = '20150106' GROUP BY 1;

For your use case:

You could first convert your queries against date sharded tables to use _TABLE_SUFFIX pseudo column. Then, when you are able to convert these tables to partitioned tables. After that point, you will need a minor change to your scripts to use the partitioning pseudo column (since the pseudo column name is a bit different in these two cases).

Transforming to new “Partitioned Tables” from existing system

2 Answers