6
votes

I have the following requirement.

We have 1400 sharded tables in BQ based on specific functional grouping. Maximum of these individual sharded tables are also time-partitioned for DAY.

Smaller tables are not time-partitioned.

I am trying to create views dynamically in BigQuery by doing a UNION of all the sharded tables and writing a WHERE filter condition on _partitiontime.

But the problem is there could be smaller tables which are not time-partitioned, the query fails.

There are many more tables created on daily basis and I cant settle for a static exclusion solution since that would require manually maintaining a file with table names etc. (Apache Beam + BQ uses the data that is coming in and creates new sharded tables for newer functional groups without manual intervention.)

I am trying to exclude the non time-partitioned tables by using bq utility to connect to BQ more than 1000 times to check if a table is time-partitioned.

bq show --format=prettyjson and check the timepartitioning type field.

This is very slow, more than 30 minutes.

I have tried the __TABLES_SUMMARY__, but it does not have the partitioning info.

I have also checked SELECT partition_id from [mydataset.table1$__PARTITIONS_SUMMARY__];, but this works if the table is already partitioned.

3
Did you found any workaround for this? Instead having separate dataset? - Jaya Ananthram

3 Answers

2
votes

You can write the below query in standard SQL in BQ to get the information

SELECT * 
  FROM `project_id.dataset_name.INFORMATION_SCHEMA.COLUMNS`
  WHERE TABLE_NAME = Table_Name 
    AND is_partitioning_column = "YES"
1
votes

Unfortunately, there is no way to determine this dynamically.

As a workaround, I would suggest keeping your partitioned tables and your non-partitioned tables in distinct datasets OR including information about whether the table is partitioned in its name.

You could also maintain a list of table prefixes that are known to be partitioned (or non-partitioned, if that is easier to maintain).

You can also open a feature request on the public issue tracker for BigQuery: https://issuetracker.google.com/issues/new?component=187149&template=0

0
votes

You can write a function that runs a query like this with try and except, the query will error if the table is not partitioned.

SELECT partition_id FROM [%s.%s$__PARTITIONS_SUMMARY__]' % (dataset_id, table_id)