0
votes

I have a BigQuery table partitioned by Day on a timestamp field as below:

Table details

Data Sample:

Row _time                    dummy_column
1   2020-06-15 23:57:00 UTC  a
2   2020-06-15 23:58:00 UTC  b
3   2020-06-15 23:59:00 UTC  c
4   2020-06-16 00:00:00 UTC  d
5   2020-06-16 00:00:01 UTC  e
6   2020-06-16 00:00:02 UTC  f

Due to the fact that the table is partitioned on _time but it is partition by Day, so in order to query in a specific day partition 2020-06-15, I run:

select * from {DATASET}.{TABLE} where _time >= TIMESTAMP("2020-06-15") and _time < TIMESTAMP("2020-06-16");

Result:

Row _time                    dummy_column
1   2020-06-15 23:57:00 UTC  a
2   2020-06-15 23:58:00 UTC  b
3   2020-06-15 23:59:00 UTC  c

My question is: Is there a way to query a Day partition directly by mentioning the it explicitly instead of querying using a timestamp range?

1

1 Answers

1
votes

When you have a table partitioned by Day, you can directly reference the partition day you want to query.

In order to demonstrate your case, I have used the following table schema:

Field name      Type    Mode      Policy tags   Description
date_formatted  DATE    NULLABLE    
fullvisitorId   STRING  NULLABLE    

Other table's details,

Table type              Partitioned
Partitioned by          Day
Partitioned on field    date_formatted
Partition filter        Not required

And some sample data,

Row date_formatted  fullvisitorId
1   2016-12-30  6449885916997461186
2   2016-12-30  3401232735815769402
3   2016-12-30  2100622457042859506
4   2016-12-30  4434434796889840043
5   2016-12-31  9382207991125014696
6   2017-12-30  4226029488400478200
7   2017-12-31  4304624161918005939
8   2017-12-31  4239590118714521081
9   2018-12-30  0030006068136142781
10  2018-12-30  7849866399135936504

You can use the syntax below to query the above sample data,

DECLARE dt DATE DEFAULT Date(2016,12,30);

SELECT * FROM `project.dataset.table_name` WHERE date_formatted = dt

The output,

Row date_formatted  fullvisitorId
1   2016-12-30  6449885916997461186
2   2016-12-30  3401232735815769402
3   2016-12-30  2100622457042859506
4   2016-12-30  4434434796889840043

As you can see it only retrieved the data for the specific date I declared.

Notice that I have used the DECLARE clause because it facilitates modifying the date filter. Also, if your field is formatted as a TIMESTAMP, you can replace DATE() to TIMESTAMP() to define your filter within your variable.

As an additional information, if you want to use a range, consider using the BETWEEN clause such as WHERE partition_field BETWEEN date_1 and date_2.

UPDATE:

I have used your sample data this time, I have used the below syntax to create a table exactly like you described. Below is the code:

create table dataset.table_name(_time timestamp, dummy_column string) partition by date(_time) 
as select  timestamp '2020-06-15 23:57:00 UTC' as _time, "a" as dummy_column  union all
select  timestamp '2020-06-15 23:58:00 UTC' as _time, "b" as dummy_column  union all
select  timestamp '2020-06-15 23:59:00 UTC' as _time, "c" as dummy_column  union all
select  timestamp '2020-06-16 00:00:00 UTC' as _time, "d" as dummy_column  union all
select  timestamp '2020-06-16 00:00:01 UTC' as _time, "e" as dummy_column  union all
select  timestamp '2020-06-16 00:00:02 UTC' as _time, "f" as dummy_column  

The table:

enter image description here

The schema:

enter image description here

The details:

enter image description here

In order to select only one date from your timestamp field (_time), you can do as follows:

SELECT * FROM `project.dataset.table` WHERE DATE(_time) = "2020-06-15" 

And the output,

enter image description here

As it is shown above the output is as you desired.

Moreover, as an extra information I would like to encourage you to have a look at this documentation about partition by.