6
votes

I have a table with 340GB of data, but we use only last one week of data. So to minimize the cost planning to move this data to partition table or shard tables.

I have done some experiment with shard tables and partition. I have created partition table and loaded two days worth of data(two partitions) and created two shard tables(Individual tables). I tried to pull last two days worth of data.

Full table - 27sec Partition Table - 33 sec shard tables - 91 sec

Please let me know which way is best. Based on the experiment result is giving quick when I run against full table but full table will scan.

Thanks,

2
Can you please provide the two queries you are using for comparison? It would be also useful to know the two job ids. (I don't need the project id). - Pavan Edara

2 Answers

5
votes

From GCP official documentation on Partitioning versus Sharding you should use Partitioned tables.

Partitioned tables perform better than tables sharded by date. When you create date-named tables, BigQuery must maintain a copy of the schema and metadata for each date-named table. Also, when date-named tables are used, BigQuery might be required to verify permissions for each queried table. This practice also adds to query overhead and impacts query performance. The recommended best practice is to use partitioned tables instead of date-sharded tables.

0
votes

The difference in performance seems to be due to some background optimizations that have run on the non-partitioned table, but are yet to run on the partitioned table (since the data is newer).