0
votes

I was wondering what the usage of using a partitioned table in BigQuery is. It seems most of the queries seem to take about the same time to finish regardless of size (ignoring extremes, I'm generalizing), is this mainly a matter of using it to reduce costs on the bytes processed, or what is the main use case of partitioning tables in BQ?

https://cloud.google.com/bigquery/docs/creating-column-partitions

1
It's cheaper to query of course. Also, using the old way of date sharded tables (_YYYYMMDD) is cumbersome/clunky to query and you hit its limits pretty quickly too. It simplifies queries by presenting just one table to the user rather than hundreds.Graham Polley
@GrahamPolley got it, thanks for the comment. Do you find yourself using the BQ partitions at all? If so, what's an example use-case of yours.David542

1 Answers

3
votes

There are multiple benefits, mainly costs.

  • by writing a query to read only eg: 7 days of partitions instead of 7 years you have lower costs
  • partitions you don't touch for older than 90 days are at lower costs
  • you can clearly reload a day's data much more easier than having to work around
  • you are still recommended to use YEARly tables eg mytable_2018, but you are no longer required to have daily tables eg: mytable_20180101, this further leads to have simpler queries, also no longer a problem to read more than 1000 tables (which is a hard limit).
  • when you modify schema, you need to modify a few tables, you no longer need to script alters on thousands of table
  • this also means it's lover bytes processed and in the cloud platform can be better optimized and needs fewer resources
  • by reorganizing data into partitioned tables the query times will benefit in the future. As customers will move data, the cloud engineering team will optimize the service for better usage.
  • you see clear cost wise benefits if your existing data is at least a couple of terabytes.