3
votes

I have a partitioned table in BigQuery to store user events, which has following schema,

user_id - String, Required
event_time - DateTime, Required
country - String, Nullable
_PARTITIONTIME - event occurred date

Above table holds two years worth of data (means 730 partitions).

For some reason I would like to change the data type from DateTime to Timestamp. I found a similar questions here and here, those are working fine for non-partitioned table. But in my case, I have partitioned table so I need to retain the value _PARTITIONTIME as such.

After some research I found, running the following query with Destination Table as table_name$20180126 seems to achieve my goal for a single partition,

SELECT user_id, CAST(event_time AS TIMESTAMP) AS event_time, country from [project-id:data_set.table_name] WHERE _PARTITIONTIME >= "2018-01-26 00:00:00" AND _PARTITIONTIME < "2018-01-27 00:00:00"

But the problem is, I have 730 partition for a table (similarly I have 10 more table like this), running above query one by one in web console or through API or through BQ command line will take ages. Is there is any other better way to achieve the use case with minimal work effort?

1
Although DDL has been recently launched it only support Create and Delete syntax. No update yet. Scripting this as you mentioned, is the way to go. It should not take forever as you can issue multiple one in parallel.Pentium10
@Pentium10 - if scripted, does that mean 730 full table scans ($$$) ?Graham Polley
@GrahamPolley Do you suggest any other approach?Jaya Ananthram
I'm thinking you could use Dataflow or Mikhails approach over here (slightly tweaked of course): stackoverflow.com/questions/38993877/…Graham Polley
Could you patch your table with a new column. Run an update statement to populate the new column. Eventually rewrite the table by removing the old column(either keep it). cloud.google.com/bigquery/docs/reference/rest/v2/tables/patchPentium10

1 Answers

1
votes

There has been a lot of discussion in the comments, but I want to highlight two things.

  1. As of today you can create partitioned tables using an existing column TIMESTAMP or DATETIME type as the partition definer.

  2. Also there is a guide pubished for Manually Changing Table Schemas

You may want to read both recommendations especially the first one come handy to you, and you would choose reloading data for lots of benefits.