Limit the number of records BigQuery is required to scan for any given query?

Question

I have uploaded some large tables to BigQuery and can run queries on them. I have successfully reduced costs horizontally by scanning only the specific required columns rather than SELECT *

Are there any ways to limit the data scanned vertically as well. I can see that using LIMIT will not help:

Applying a LIMIT clause to a SELECT * query does not affect the amount of data read. You are billed for reading all bytes in the entire table

Are there any other ways of reducing the number of records BigQuery scans for a given query? Perhaps by means of uploading (and correctly naming) many smaller tables rather than one large one, or through specific BigQuery SQL?

In case it is relevant, my files are in parquet format.

fpopic fpopic · Accepted Answer · 2019-11-11T00:34:26

Check partitioning and clustering in BigQuery.

https://cloud.google.com/bigquery/docs/partitioned-tables

https://cloud.google.com/bigquery/docs/clustered-tables (works nicely in cost reduction with LIMIT as well)

Limit the number of records BigQuery is required to scan for any given query?

2 Answers