7
votes

We want to start using "Partitioned tables" in BQ But documentation(https://cloud.google.com/bigquery/docs/partitioned-tables) says that using "Streaming inserts" possible only

if the partitioning value is up to 7 days in the past, up to 3 days in the future,

In our case, we have some data which could have q partition value more than 7 days in the past.

We save data via BigQuery REST api

Does it mean that we can't use the partition tables or is there some other workaround this? How to save the data which is out of bounds(7days3days) for a partition table?

In general, the idea is: we have a table with streaming data(~100 records per min) and we want to stream data directly to partitions and then use the partitions for analytical queries

1
Streaming is generally for "live" data. What is the scenario such that you have week-old live data?Elliott Brossard
You may known mine: we have future events up to 6 months, eg schedulers. This limitation is a joke on BQ. Would love to be removed.Pentium10
@ElliottBrossard this is not a daily based situation, but possible. just an example - the system which is pushing events to BQ was down for 8 days, we use "created_at" as a partition key, when we run it again - we are getting all events for last 8 days and in this situation, we lose 1 day of events. The problem with a partition key which could be a different date for an event and this date could be at any point in time depends on requirementsNick Bilozerov
We are trying to remove this limitation as soon as possible. There are some implications around how long streamed data will take before it gets properly partitioned and provides the cost advantage. In general we want to keep the working set of number of partitions that are affected by streaming small so that we can keep the partitioning cost for the system under control.Pavan Edara
@PavanEdara thanks for the answer! Do you have any estimates when it would be done, for us partitioning is one the top priorities and would be good to know should we find some workaround or we should just wait?Nick Bilozerov

1 Answers

6
votes

To summarize the comment thread:

  • It's not possible to stream to partitions beyond [7 days in the past, 3 days in the future]. This is a performance related limitation, and the team is working to remove it.

  • Workaround: Stream your data to a non-partitioned table, and from there insert into the partitioned one.