Assign points to polygons effeciently

Question

I have table of polygons (thousands), and table of points (millions). Both tables have GIST indexes on geometry columns. Important this is, polygons do not overlap, so every point is contained by exactly one polygon. I want to generate table with this relation (polygon_id + point_id).

Trivial solution of course is

SELECT a.polygon_id, p.point_id
FROM my_polygons a
JOIN my_points p ON ST_Contains(a.geom, p.geom)

This works, but I think it is unnecessary slow, since it matches every polygon with every point - it does not know that every point can belong to one polygon only.

Is there any way to speed things up?

I tried looping for every polygon, selecting points by ST_Contains, but only those not already in the result table:

CREATE TABLE polygon2point (polygon_id uuid, point_id uuid);

DO $$DECLARE r record;
BEGIN
    FOR r IN SELECT polygon_id, geom
         FROM my_polygon             
    LOOP

    INSERT INTO polygon2point (polygon_id, point_id) 
    SELECT r.polygon_id, p.point_id
    FROM my_points p
    LEFT JOIN polygon2point t ON p.point_id = t.point_id
    WHERE t.point_id IS NULL AND ST_Contains(r.geom, p.geom);

    END LOOP;
END$$;

This even slower than trivial JOIN approach. Any ideas?

I don't know postgis so can you clarify: is an index used on my_points when you call ST_Contains? What is "unnecessarily slow", can you give us the actual times? And there are only thousands (how many?) of polygons, and each contains only one point (which I assume is just a row) from the points table, is that correct? Maybe an EXPLAIN ANALYZE would be useful. — 404
Yes, both tables have geometry columns indexed. Also, each polygon can contain many points, but one point is inside one polygon. Exact numbers do not matter that much, I have multiple datasets and it depends on machine. For ~5000 points and ~1m points, it takes about 15 seconds on my db server. — rouen
What kind of index exactly did you create on those two columns? Please edit your question and add the create table statements for both tables including all create index statements. According to the manual ST_Contains() should be able to use GIST indexes on those columns. And as 404 already said: the execution plan generated using explain (analyze, buffers) would be helpful as well — a_horse_with_no_name

JGH JGH · Accepted Answer · 2019-03-25T11:56:27

A way to increase the speed is to subdivide the polygons into smaller ones.

You would create a new table (or a materialized view should the polygon change often), index it, and then run the query. If the subdivisions have 128 vertices or less, the data will, by default, be stored uncompressed on disk, making the queries even faster.

CREATE TABLE poly_subdivided AS 
   SELECT ST_SUBDIVIDE(a.geom, 128) AS geom , a.polygon_id 
   FROM poly;

CREATE INDEX poly_subdivided_geom_idx ON poly_subdivided  USING gist(geom);

ANALYZE poly_subdivided;

SELECT a.polygon_id, p.point_id
FROM poly_subdivided a
JOIN my_points p ON ST_Contains(a.geom, p.geom)

Here is a great article on the topic.

Assign points to polygons effeciently

1 Answers