0
votes

I'm doing a fairly conventional point/polygon spatial join in order to determine the number of points that fall in each polygon. The result (from the left join) includes all of the polygons + the count where there is an intersection. But it doesn't return the polygon + 0 or null where there is no intersection. I'd like to still have the polygon geometry as it looks better on a map (vs. just a missing area).

 WITH
  geoidandcount AS(
  SELECT
    a.geo_id,
    COUNT(b.latitude) AS count
  FROM
    `polygon_dataset` a
  LEFT OUTER JOIN -- ERROR
    `point_dataset` b
  ON
    st_contains(a.the_geom,
      ST_GEOGPOINT(b.longitude,
        b.latitude))
  GROUP BY
    a.geo_id )
SELECT
  a.*,
  b.the_geom,
  a.count,
FROM
  geoidandcount a
JOIN
  `polygon_dataset` b
USING
  (geo_id)

I had to do the CTE as you can't group by geography.

enter image description here

1

1 Answers

2
votes

Yes, BigQuery cannot yet optimize geospatial OUTER JOIN, which is why you get this error. The workaround is to do geospatial INNER JOIN followed by OUTER JOIN on some unique ID in the OUTER table.

To demonstrate on simpler example, let's take this part of the query:

  SELECT
    a.geo_id,
    b.latitude
  FROM
    `polygon_dataset` a
  LEFT OUTER JOIN -- ERROR
    `point_dataset` b
  ON
    st_contains(a.the_geom, ST_GEOGPOINT(b.longitude, b.latitude))

Assuming geo_id is unique in the left table, this can be replaced with

  SELECT
    a.geo_id,
    c.latitude
  FROM
    `polygon_dataset` a
  LEFT OUTER JOIN
  (
    SELECT 
      a.geo_id
      b.latitude
    FROM
      `polygon_dataset` a
    JOIN
      `point_dataset` b
    ON
      st_contains(a.the_geom, ST_GEOGPOINT(b.longitude, b.latitude))
  ) c
  ON a.geo_id = c.geo_id

Here the count would be NULL when nothing is matched on the right, so let's use COALESCE(count, 0) to get 0.

The full query will be something like

  SELECT
    a.geo_id,
    a.the_geom,
    COALESCE(c.count, 0) AS count
  FROM
    `polygon_dataset` a
  LEFT OUTER JOIN
  (
    SELECT 
      a.geo_id
      COUNT(b.latitude) as count
    FROM
      `polygon_dataset` a
    JOIN
      `point_dataset` b
    ON
      st_contains(a.the_geom, ST_GEOGPOINT(b.longitude, b.latitude))
    GROUP BY
      geo_id
  ) c
  ON a.geo_id = c.geo_id