I have a table that is range partitioned on the integer representation of 3-character geohashes. I need to select multiple partitions from this table, however, the query is reading the whole table before giving me the required partitions.
Query:
WITH
hashes AS (
SELECT
['ezw',
'ezx',
'ezy',
'ezz'] AS hash_list)
SELECT
*
FROM
hashes,
UNNEST(hashes.hash_list) AS hash_str,
`project.dataset.partitioned_table`
WHERE
geohash_num = ABS(MOD(farm_fingerprint(hash_str),4000))
The three-character geohashes are in hash_list
, and abs(mod(farm_fingerprint(geohash),4000))
is how I get the numeric representation (<4000) of any geohash string.
The project.dataset.partitioned_table
is partitioned on geohash_num
, this column has numbers from 1 to 4000.
I would expect this query to fetch only the needed numeric partitions corresponding to the geohashes in hash_list
, which would correspond to 961, 1234, 2684, 3606 but it reads all partitions before giving me the 4 needed partitions.
Does anyone have any suggestions on how to have it read only the needed partitions and not the whole table?
Clarification: This partition pruning works when I have only a single geohash,the following query reads only one partition and not the whole table, even though the where
filter does not have a constant predicate :
WITH
hashes AS (
SELECT
'ezw' AS hash_str)
SELECT
*
FROM
hashes,
`freesolarcalc.Water_body_data.lake_geo33clustered_partitioned_dbscan`
WHERE
geohash_num = ABS(MOD(farm_fingerprint(hash_str),4000))
The issue arises when I have to filter on multiple geohashes corresponding to multiple partition values.