We have a pipeline set to receive data from Apache Alooma to Bigquery (partitioned tables based on _PARTITIONTIME).
Observation so far is data is not directly dumped into _PARTITIONTIME column rather it first goes into 'null' partition and then gradually dumped into _PARTITIONTIME column .
example:
Query 1:
select DATE(_PARTITIONTIME) , count(1) from `<table>` where
FORMAT_DATETIME( "%Y-%m-%d %T" , <col1>) >= "2019-05-07 00:02:00" and FORMAT_DATETIME( "%Y-%m-%d %T",<col1>)<= "2019-05-07 23:59:59"
group by 1
Output :
1 null 21492
2 2019-05-07 6633
around 3 minutes reran the query and found the output as this
1 null 26355
2 2019-05-07 6633
Note -- Pulled record with a unique key from this partition and ran Query 2.
after 10-15 minutes again reran the query with the below results:
1 2019-05-07 28125
2 null 16660
To support the observation ran below query
Query 2:
select * from `<table>` where unique_col >= '5cd0d13d0ae0' and
DATE(_PARTITIONTIME) = '2019-05-07'
--0 record
select * from `<table>` where unique_col >= '5cd0d13d0ae0' and
_PARTITIONTIME is null
--output appeared
record found in _PARTITIONTIME column after 4 min
select * from `<table>` where unique_col >= '5cd0d13d0ae0' and
DATE(_PARTITIONTIME) = '2019-05-07'
--output appeared
As per the above observation, seems the data first gets into the temporary null partition and from null to actual partition column. Is this understanding correct?