Create a clustered table in BigQuery from existing table with _PARTITIONTIME

Question

I am trying to make a new clustered table, db.new_table, that have the same data and schema as an existing table, db.old_table, in BigQuery. The existing table have a pseudo column _PARTITIONTIME, and I would like the new table to have this _PARTITIONTIME pseudo column as well.

I have tried using DDL, with query like:

CREATE TABLE `db.new_table`
PARTITION BY DATE(_PARTITIONTIME)
CLUSTER BY field1, field2
AS SELECT * FROM `db.old_table`
WHERE _PARTITIONTIME > '1990-01-01'

However it failed because we cannot use PARTITION BY DATE(_PARTITIONTIME) followed by AS SELECT .... as stated in https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language

Is there a methods to do this? (create a new clustered table with the same exact schema and data from an old table partitioned by pseudo column _PARTITIONTIME )

Any answer and comment is appreciated, thanks.

Notes: I can create a similar table without _PARTITIONTIME with query like:

CREATE TABLE `db.new_table`
PARTITION BY partition_date
CLUSTER BY field1, field2
AS SELECT DATE(_PARTITIONTIME) AS partition_date, * FROM `db.old_table`
WHERE _PARTITIONTIME > '1990-01-01'

However since a lot of things in the system depend on db.old_table, the change in partition field from _PARTITIONTIME to partition_date would cause a lot of query changes... Therefore it would be much preferable if we can create the clustered table with exactly same schema and data.

khan khan · Accepted Answer · 2018-11-19T19:00:21

You can just pre-create your day-partitioned, clustered table (on whatever fields) named db.new_table using either BQ UI or bq command.

Once the table is there, you can populate "for each day" as:

bq query --allow_large_results --append_table --noflatten_results --destination_table 'db.new_table$19900101' "select field1, field2, field3 from db.old_table where _PARTITIONTIME = '1990-01-01'";

Notice two things:

You have to run this query for each day separately (which will cost you pretty much the same so don't worry about it).
The db.new_table$19900101 points to the partition of 1990-01-01 in db.new_table.

Create a clustered table in BigQuery from existing table with _PARTITIONTIME

1 Answers