Aggregating last 30 days data in BigQuery

Question

I am checking the feasibility of moving from Redshift to BigQuery. I need help in implementing the below use case on BigQuery.

We have a by day product performance table which is a date partitioned table. It is called product_performance_by_day. There is a row for every product that was sold each day. Every day we process the data at the end of the day and put it in the partition for that day. Then we aggregate this by day performance data over the last 30 days and put it in the table called product_performance_last30days. This aggregation saves querying time and in the case of BigQuery will save the cost as well since it will scan less data.

Here is how we do it in Redshift currently -
We put the aggregated data in a new table e.g. product_performance_last30days_temp. Then drop the product_performance_last30days table and rename product_performance_last30days_temp to product_performance_last30days. So there is very minimal downtime for product_performance_last30days table.

How can we do the same thing in the BigQuery?

Currently, BigQuery does not support renaming tables or materialized views or table aliases. And since we want to save the aggregated data in the same table every day we cannot use destination table if the table is not empty.

as in november 2016, querying 1TB costs $5, having 30 such tables (100GB * 30=3TB) would cost $15 for a full scan. What we are talking about here? It looks damn cheap. What's your ultimate goal? — Pentium10

Pentium10 Pentium10 · Accepted Answer · 2016-11-08T09:10:44

You can overwrite the same table by using writeDisposition Specifies the action that occurs if the destination table already exists.

The following values are supported:

WRITE_TRUNCATE: If the table already exists, BigQuery overwrites the table data.
WRITE_APPEND: If the table already exists, BigQuery appends the data to the table.
WRITE_EMPTY: If the table already exists and contains data, a 'duplicate' error is returned in the job result. The default value is WRITE_EMPTY.

Each action is atomic and only occurs if BigQuery is able to complete the job successfully. Creation, truncation and append actions occur as one atomic update upon job completion.

For RENAMING tables look on this answer.

Aggregating last 30 days data in BigQuery

1 Answers