Every week, I get a new dataset that I need to insert in BigQuery. The data can arrive on any day of the week. Once the data is ingested, I want to query data that arrived last week.
One option is to use date as partitioning when the data arrived but then the developers would need to know the exact date when data arrived to query the partition.
Instead of this, while ingestion, I want to create an INTEGER column which represents the calendar week of the year. The format will be 202005 or 202153 where former represents fifth week of 2020 and latter represents second last week of year 2021.
Since this is an integer, the only option for partition seems to be range partitioning. For it, BigQuery is asking for a start, end and interval. What values should I define?
I can define the following but as you can imagine that this sounds wrong
start 202001
end 203054
inerval 1
Update: It seems that bigquery will only create partitions for which it has data. I checked that by executing
#legacySQL
SELECT
project_id, dataset_id, table_id, partition_id, TIMESTAMP(creation_time/1000) AS creation_time
FROM [PROJECT_ID:DATASET_ID.TABLE_ID$__PARTITIONS_SUMMARY__]