1
votes

Every week, I get a new dataset that I need to insert in BigQuery. The data can arrive on any day of the week. Once the data is ingested, I want to query data that arrived last week.

One option is to use date as partitioning when the data arrived but then the developers would need to know the exact date when data arrived to query the partition.

Instead of this, while ingestion, I want to create an INTEGER column which represents the calendar week of the year. The format will be 202005 or 202153 where former represents fifth week of 2020 and latter represents second last week of year 2021.

Since this is an integer, the only option for partition seems to be range partitioning. For it, BigQuery is asking for a start, end and interval. What values should I define?

I can define the following but as you can imagine that this sounds wrong

start 202001
end 203054
inerval 1

Update: It seems that bigquery will only create partitions for which it has data. I checked that by executing

#legacySQL
SELECT 
project_id, dataset_id, table_id, partition_id, TIMESTAMP(creation_time/1000) AS creation_time
FROM [PROJECT_ID:DATASET_ID.TABLE_ID$__PARTITIONS_SUMMARY__]
1

1 Answers

2
votes

Another option would be to still Partition by date - but not ingestion date or whatever date you have in mind, rather start date of respective week with the help of DATE_TRUNC function

DATE_TRUNC(your_date, WEEK)

Note: You even can define start day of the week

WEEK(): Truncates date_expression to the preceding week boundary, where weeks begin on WEEKDAY. Valid values for WEEKDAY are SUNDAY, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, and SATURDAY.