3
votes

I'm working with a BigQuery partitioned table. The partition is based on a Timestamp column in the data (rather than ingestion-based). We're streaming data into this table at a rate of several million rows per day.

We noticed that our queries based on specific days were scanning much more data than they should in a partitioned table.

Here is the current state of the UNPARTITIONED partition:

Current state of the UNPARTITIONED partition

I'm assuming that little blip at the bottom-right is normal (streaming buffer for the rows inserted this morning), but there is this massive block of data between mid-November and early-December that lives in the UNPARTITIONED partition, instead of being sent to the proper daily partitions (the partitions for that period don't appear to exist at all in __PARTITIONS_SUMMARY__).

My two questions are:

  • Is there a particular reason why these rows would not have been partitioned correctly, while data before and after that period is fine?

  • Is there a way to 'flush' the UNPARTITIONED partition, i.e. force BigQuery to dispatch the rows to their correct daily partition?

1
Debugging: Did you try recreating the table? Did this solve it? If this is a bug - please try the issue tracker for follow up.Felipe Hoffa
Did you find the answers to your question? I am facing a similar issue where the data is present in the UNPARTITIONED partition for a long time. Is there a time limit on when this data will be moved to the corresponding partition or will it wait infinitely for the size to exceed threshold to be moved?Sudarshan Murthy
@SudarshanMurthy Sadly the only fix I found at the time was to follow Felipe's advice by copying the table to a new one—this completely fixed the issue for me.radiumhead

1 Answers

0
votes

I faced a similar type of issue where a lot of rows stayed unpartitioned in a column-based partitioned table. So, what I observed that some records are not partitioned due to the source of the streaming insert. For the soulition, I update the table using the update and set a partitioned date where the partitioned column date is null. For safer side make sure that partitioned date column should not be nullable.