I have a table t
in BigQuery that contains ~5billion rows (~80TB) and it is partitioned on column dateTimeCreated
which is of type TIMESTAMP and is partitioned by DAY. The table contains data for 5 years so no more than 1825 partitions.
I'd like to find out how many rows exist in the table per day so I crafted this SQL query:
select timestamp_trunc(datetimecreated,DAY),count(*)
from `p.d.t`
where datetimecreated > '2000-01-01'
group by 1
order by 1 desc
I was hoping that BigQuery would be able to return the results rapidly because this is basically counting the number of rows in each partition which, I would expect, is a tally that BigQuery maintains as internal metadata anyway (that's certainly my experience when using ingestion time partitioned tables).
Unfortunately that seems to not be the case. It took BigQuery 73s to return the result:
Query complete (1 min 13 sec elapsed, 37.4 GB processed)
I'm curious if there's a more efficient way to query this table. If it were an ingestion-time partitioned table my query would be:
select _PARTITION_DATE,count(*)
from `p.d.t`
where datetimecreated > '2000-01-01'
group by 1
order by 1 desc
which I'm confident would return very quickly. This isn't an ingestion-time partitioned table though.
Is there a more efficient method to achieve my desired result?
Another question,, does BigQuery provide queryable metadata per partition that includes the cardinality of the partition?