I have a date partitioned table (call it sample_table) with 2 columns, one to save dateTime in UTC and other to save timezone offset. I have a view on top of this table (call it sample_view). The view takes _partitiontime in from table and exposes that as partitionDate column and also there is another column customerDateTime which simply adds dateTime with timeZoneOffset.
When I query the sample_table directly using only _partitiontime bigquery scans far less data (131 MB).
select
containerName,
count(*)
from
[sample_project.sample_table]
where
_partitiontime between timestamp('2016-12-12') and timestamp('2016-12-19')
and customer = 'X'
and containerName = 'XXX'
group by containerName
;
But when I run same query on the table with dateTime column to scan according to customer's local date time big query scans more (211MB). I expected less than 131MB or equal to 131MB.
select
containerName,
count(*)
from
[sample_project.sample_table]
where
_partitiontime between timestamp('2016-12-12') and timestamp('2016-12-19')
and DATE_ADD(dateTime, 3600, 'SECOND' ) between timestamp('2016-12-12 08:00:00') and timestamp('2016-12-19 15:00:00')
and customer = 'X'
and containerName = 'XXX'
group by containerName
;
When I run similar query against the sample_view with partitionDate bigquery scans more (399MB)
select
containerName,
count(*)
from
[sample_project.sample_view]
where
partitionDate between timestamp('2016-12-12') and timestamp('2016-12-19')
and customer = 'X'
and containerName = 'XXX'
group by containerName
;
And when I run query against the view with partitionDate and use customerDateTime column as well bigquery scans even more (879MB)
select
containerName,
count(*)
from
[sample_project.sample_view]
where
partitionDate between timestamp('2016-12-12') and timestamp('2016-12-19') and customerDateTime between timestamp('2016-12-12 08:00:00') and timestamp('2016-12-19 15:00:00')
and customer = 'X'
and containerName = 'XXX'
group by containerName
;
I'm not too sure whether I'm scanning right partitions from any of the queries above. Why do I see the differences between these queries? Is exposing _partitiontime as a new column partitionDate a bad strategy? I'm not sure how else to use the partition date within Tableau without writing more queries. Please let me know if you require more details.