Partitioned tables in BigQuery

Question

I was wondering what the usage of using a partitioned table in BigQuery is. It seems most of the queries seem to take about the same time to finish regardless of size (ignoring extremes, I'm generalizing), is this mainly a matter of using it to reduce costs on the bytes processed, or what is the main use case of partitioning tables in BQ?

https://cloud.google.com/bigquery/docs/creating-column-partitions

It's cheaper to query of course. Also, using the old way of date sharded tables (_YYYYMMDD) is cumbersome/clunky to query and you hit its limits pretty quickly too. It simplifies queries by presenting just one table to the user rather than hundreds. — Graham Polley
@GrahamPolley got it, thanks for the comment. Do you find yourself using the BQ partitions at all? If so, what's an example use-case of yours. — David542
Yes, all the time. e.g. cloud.google.com/bigquery/docs/doubleclick-publisher-transfer — Graham Polley

Pentium10 Pentium10 · Accepted Answer · 2018-02-09T08:02:48

There are multiple benefits, mainly costs.

by writing a query to read only eg: 7 days of partitions instead of 7 years you have lower costs
partitions you don't touch for older than 90 days are at lower costs
you can clearly reload a day's data much more easier than having to work around
you are still recommended to use YEARly tables eg mytable_2018, but you are no longer required to have daily tables eg: mytable_20180101, this further leads to have simpler queries, also no longer a problem to read more than 1000 tables (which is a hard limit).
when you modify schema, you need to modify a few tables, you no longer need to script alters on thousands of table
this also means it's lover bytes processed and in the cloud platform can be better optimized and needs fewer resources
by reorganizing data into partitioned tables the query times will benefit in the future. As customers will move data, the cloud engineering team will optimize the service for better usage.
you see clear cost wise benefits if your existing data is at least a couple of terabytes.

Partitioned tables in BigQuery

1 Answers