1
votes

I have a use case in which I do the following:

  1. Insert some rows into a BigQuery table (t1) which is date partitioned.
  2. Run some queries on t1 to aggregate the data and store them in another table.

In the above use case, I faced an issue today where the queries I run had some discrepancy in the aggregated data. When I executed the same queries some time later from the Web UI of BigQuery, the aggregation were fine. My suspicion is that some of the inserted rows were not available for the query.

I read this documentation for BigQuery data availability. I have the following doubts in this:

  1. The link says that "Streamed data is available for real-time analysis within a few seconds of the first streaming insertion into a table". Is there an upper limit on the number of seconds to wait before it is available for real time analysis?
  2. From the same link: "Data can take up to 90 minutes to become available for copy and export operations". Does the following operations come under this restriction?
    • Copy the result of a query to another table
    • Exporting the result of a query to a csv file in cloud storage
  3. Also from the same link- "when streaming to a partitioned table, data in the streaming buffer has a NULL value for the _PARTITIONTIME pseudo column". Does this mean that I should not use _PARTITIONTIME in the queries till data is present in the streamingBuffer?

Can somebody please clarify these?

2

2 Answers

1
votes

For the questions:

Does the following operations come under this restriction?

Copy the result of a query to another table
Exporting the result of a query to a csv file in cloud storage

The results of a query are immediately available for any operation (like copy and export) - even if that query had been ran on streamed data still in the buffer.

2
votes

You can use _PARTITIONTIME is null to detect which rows are in buffer. You can actually use this logic to further UNION this buffer to a date you wish (like today). You could do wire in some logic that reads the buffer and where time is null it will set a time for the rest of the query logic.

This buffer is by design a bit delayed, but if you need immediate access to data you need to use the IS NULL trick to be able to query it.