0
votes

I have uploaded some large tables to BigQuery and can run queries on them. I have successfully reduced costs horizontally by scanning only the specific required columns rather than SELECT *

Are there any ways to limit the data scanned vertically as well. I can see that using LIMIT will not help:

Applying a LIMIT clause to a SELECT * query does not affect the amount of data read. You are billed for reading all bytes in the entire table

Are there any other ways of reducing the number of records BigQuery scans for a given query? Perhaps by means of uploading (and correctly naming) many smaller tables rather than one large one, or through specific BigQuery SQL?

In case it is relevant, my files are in parquet format.

2

2 Answers

2
votes

Check partitioning and clustering in BigQuery.

https://cloud.google.com/bigquery/docs/partitioned-tables

https://cloud.google.com/bigquery/docs/clustered-tables (works nicely in cost reduction with LIMIT as well)

0
votes

I think this will helpful.

Two basic operations are reduce number of rows that are scanned and reduce number of columns that are scanned.

To reduce number of rows,obvious way is using LIMIT.If you can't reduce the number of rows using LIMIT,another approach is looking at a certain pattern in the dataset.as a example if you have date column using BETWEEN operator to select only required data.

Another way is put your data to separate tables.It will limits the amount of data you process.but when it comes to run queries across your tables it will be little bit tricky