I'm working with a BigQuery partitioned table. The partition is based on a Timestamp
column in the data (rather than ingestion-based). We're streaming data into this table at a rate of several million rows per day.
We noticed that our queries based on specific days were scanning much more data than they should in a partitioned table.
Here is the current state of the UNPARTITIONED partition:
I'm assuming that little blip at the bottom-right is normal (streaming buffer for the rows inserted this morning), but there is this massive block of data between mid-November and early-December that lives in the UNPARTITIONED partition, instead of being sent to the proper daily partitions (the partitions for that period don't appear to exist at all in __PARTITIONS_SUMMARY__
).
My two questions are:
Is there a particular reason why these rows would not have been partitioned correctly, while data before and after that period is fine?
Is there a way to 'flush' the UNPARTITIONED partition, i.e. force BigQuery to dispatch the rows to their correct daily partition?