0
votes


Im having an issue with clustered tables in BigQuery (with date partitions). I have a table that is clustered by a column named entity_id. The thing is, i expect to see a bytes read reduction when making queries filtered by these clustered column, but according with the BigQuery Web UI it's doing a fullscan anyway.

For example:
SELECT * FROM project.usersDataset.users_cluster WHERE entity_id = '405849241' LIMIT 1000;
Returns: "Query complete (0.570 sec elapsed, 862.94 MB processed)"
This is actually the full table size (862,94 MB)

This is the table configuration: Table configuration img

EDIT: I keep going on tests and i found that sometimes, some bytes read are saved, but not too much:
Query from BigQuery Web Ui I was expecting a bigger amount of bytes cost to be saved (returned 1 entry and scanned 719MB of 862MB of the table) but nothing guaranteed these in the bigquery documentation.

Does anyone have a clue on what could be happening?
Thanks!

1
I asked a similar question in this link stackoverflow.com/questions/53980953/…, Can you also provide same screenshot from your web UI to help get to the bottom of this - Tamir Klein
Clustering is working only on partition tables, and kicks in generally above 1GB of data sets. - Pentium10
Yes Tamir, it's quite similar to your problem. Actually (as i edited the post recently) i continue testing and i found that sometimes some bytes cost reduction is been made (719MB of 862MB of the total table and returned 1 row). I suppose i was expecting a bigger cost save, but nothing guaranteed these in the bigquery documentation and as Pentium10 points out maybe the amount of data doesn't help neither. Thanks both! - Marco Lotto

1 Answers

0
votes

From BigQuery documentation provided in this link

Features under development

Support for clustering non-partitioned tables.

Please check you table is cluster and partition

Note: Cluster will also be used when no WHERE condition per BigQuery documentation