1
votes

I have a table that is range partitioned on the integer representation of 3-character geohashes. I need to select multiple partitions from this table, however, the query is reading the whole table before giving me the required partitions.

Query:

WITH
  hashes AS (
  SELECT
    ['ezw',
    'ezx',
    'ezy',
    'ezz'] AS hash_list)
SELECT
  *
FROM
  hashes,
  UNNEST(hashes.hash_list) AS hash_str,
  `project.dataset.partitioned_table`
WHERE
  geohash_num = ABS(MOD(farm_fingerprint(hash_str),4000))

The three-character geohashes are in hash_list, and abs(mod(farm_fingerprint(geohash),4000)) is how I get the numeric representation (<4000) of any geohash string.

The project.dataset.partitioned_table is partitioned on geohash_num, this column has numbers from 1 to 4000.

I would expect this query to fetch only the needed numeric partitions corresponding to the geohashes in hash_list, which would correspond to 961, 1234, 2684, 3606 but it reads all partitions before giving me the 4 needed partitions.

Does anyone have any suggestions on how to have it read only the needed partitions and not the whole table?

Clarification: This partition pruning works when I have only a single geohash,the following query reads only one partition and not the whole table, even though the where filter does not have a constant predicate :

WITH
  hashes AS (
  SELECT
    'ezw' AS hash_str)
SELECT
  *
FROM
  hashes,
  `freesolarcalc.Water_body_data.lake_geo33clustered_partitioned_dbscan`
WHERE
  geohash_num = ABS(MOD(farm_fingerprint(hash_str),4000))

The issue arises when I have to filter on multiple geohashes corresponding to multiple partition values.

1

1 Answers

1
votes

Partition pruning works only with constant expressions. You can select hash_str into variable and use it like SET my_hash_str = (SELECT ...) and then ... WHERE geohash_num = my_hash_str