0
votes

I have a requirement where I have a partitioned table in BigQuery(Hourly partition) and I have to write a batch job which will process one partition at a time.

In hive, its pretty easy to do that like I can do show partition in the main table and one metadata table where i can maintain the processed partition and later create a difference between the two dataset using except to identify the remaining partition to process.

I'm looking this in BigQuery where I can do something similar to show partition on main table and maintain a metadata table where I can maintain the processed partition.

I found something like I can do that on main table

SELECT DISTINCT FORMAT_DATETIME('%Y-%m-%d %H:%M:%S', EXTRACT(DATETIME FROM _PARTITIONTIME)) FROM DATASET.TABLE_NAME

But this will scan the full table and overhead on the cost.

1

1 Answers

0
votes

Using legacy sql you can run the following:

#legacySQL
SELECT 
partition_id
FROM [project_id:dataset.table$__PARTITIONS_SUMMARY__]

see additional documentation here on the partition summary table