2
votes

It does not seem to be possible to schedule Queries in BigQuery that write to time partitioned and clustered target tables (using WRITE_TRUNCATE and a partition decorator): we are getting the error message:

Invalid value: Incompatible table partitioning specification. Expects partitioning specification interval(type:day) clustering(siteId,channelId), but input partitioning specification is interval(type:day)

I don't understand why this is happening, isn't the clustering specification just a part of the table definition? We also don't need to specify anything extra when performing dml inserts data in an already clustered table. Or is this related to us not using DML in the scheduled query?

EDIT: The scheduled query structure looks like this:

SELECT
  reportDate,
  channelId,
  siteId,
  pageType,
  [MORE_COLUMNS_SELECT),
  SUM(timeOnPage) AS timeOnPage_agg,
  ARRAY_AGG(STRUCT( sessionId,
      [MORE_COLUMNS_NESTED)
 ) AS Details
  ----
FROM `project.dataset.viewname` 
WHERE    reportDate >= TIMESTAMP_TRUNC(TIMESTAMP_ADD(@run_time, INTERVAL -1 DAY), DAY)
     AND reportDate < TIMESTAMP_TRUNC(@run_time, DAY)
GROUP BY
    reportDate,
  channelId,
  siteId,
  pageType,
  [MORE_COLUMNS_SELECT)

I am writing the results of this query to the target table like this: TARGET_TABLE_NAME${run_time-24h|"%Y%m%d"}

That table is time partitioned on _PARTITIONTIME (= Reportdate) and clustered on siteId, channelId

1
Please file a bug with as much detail about your configuration as you can provide. It sounds like a limitation of the scheduled queries feature that it doesn't take clustering into account. - Elliott Brossard
Context please! What query are you trying to run, can I try to run a similar query... - Felipe Hoffa
Hi @FelipeHoffa, I have added the example pattern to this post. What Elliott Brossard mentions might be right, I am going to check today whether it does work with a scheduled DML statement - Rogier Werschkull

1 Answers

1
votes

As of 23-10-2018, it seems that the BigQuery scheduled query functionality does NOT support the WRITE_TRUNCATE loading pattern in combination with clustering.

What DOES work however, is writing to a clustered target table using a DML statement.