2
votes

I am trying to find intersection between incidents(polygons) and watchzones(circles - points and radius) using PostGIS. The baseline data is going to be somewhere like more than 10 000 polygons and 500 000 circles. Also, I am quite new to PostGIS.

I have tried a few things but the execution is taking quite long. Can someone please suggest any optimisations or a better way using PostGIS only. Here is what I have tried -

1. Using Geometry datatype: I have stored the incidents and watchzones in type geometry. created GIST index on them, used ST_DWITHIN to find the intersection.

The output with 1 incident and 500 000 watchzones took like 6.750sec. Here, the time taken is optimum, but the problem is that I have radius in meters and with geometry type ST_DWithin requires it to be in SRID unit. I am unable to figure out this conversion.

CREATE TABLE incident (
 incident_id SERIAL NOT NULL, 
 incident_name VARCHAR(20), 
 incident_span GEOMETRY(POLYGON, 4326), 
 CONSTRAINT incident_id PRIMARY KEY (incident_id)
);
CREATE TABLE watchzones (
 id SERIAL NOT NULL, 
 date_created timestamp with time zone DEFAULT now(), 
 latitude NUMERIC(10, 7) DEFAULT NULL, 
 Longitude NUMERIC(10, 7) DEFAULT NULL, 
 radius integer, 
 position GEOMETRY(POINT, 4326), 
 CONSTRAINT id PRIMARY KEY (id)
);

CREATE INDEX ix_spatial_geom on watchzones using gist(position);
CREATE INDEX ix_spatial_geom_1 on incident using gist(incident_span);


Insert into incident values (
   1, 
   'test', 
   ST_GeomFromText('POLYGON((152.945470916 -29.212227933,152.942130026 -29.213431145,152.939345911 -29.2125423759999,152.935144791 -29.21454003,152.933185494 -29.2135838469999,152.929481762 -29.216065516,152.929698621 -29.217402937,152.927245999 
-29.219576,152.921539 -29.217676,152.918487996 -29.2113786959999,152.919254355 -29.206029929,152.919692387 -29.2027824419999,152.936020197 -29.207567346,152.944901258 -29.207729953,152.945470916 
-29.212227933))', 
     4326
     )
     );

insert into watchzones  
  SELECT generate_series(1, 500000) AS id, 
         now(), 
         -29.21073, 
         152.93322, 
         '50', 
         ST_GeomFromText('POINT( 152.93322 -29.21073)', 4326);


explain analyze SELECT wz.id, 
       i.incident_id 
FROM watchzones wz, 
     incident i 
WHERE ST_DWithin(incident_span,position,wz.radius);

    "Nested Loop  (cost=0.14..227467.00 rows=42 width=8) (actual time=0.142..1506.476 rows=500000 loops=1)"
"  ->  Seq Scan on watchzones wz  (cost=0.00..11173.00 rows=500000 width=40) (actual time=0.109..47.822 rows=500000 loops=1)"
"  ->  Index Scan using ix_spatial_geom_1 on incident i  (cost=0.14..0.42 rows=1 width=284) (actual time=0.002..0.002 rows=1 loops=500000)"
"        Index Cond: (incident_span && st_expand(wz."position", (wz.radius)::double precision))"
"        Filter: ((wz."position" && st_expand(incident_span, (wz.radius)::double precision)) AND _st_dwithin(incident_span, wz."position", (wz.radius)::double precision))"
"Planning time: 0.150 ms"
"Execution time: 1523.312 ms"

2. Using Geography data type:

The output with 1 incident and 500 000 watchzones here, took like 29.987sec which is quite slow. Please note that I have tried this with both GIST and BRIN indexes and also ran VACUUM ANALYZE on the tables.

CREATE TABLE watchzones_geog 
         ( 
            id SERIAL PRIMARY KEY, 
            date_created TIMESTAMP with time zone DEFAULT now(), 
            latitude NUMERIC(10, 7) DEFAULT NULL, 
            longitude NUMERIC(10, 7) DEFAULT NULL, 
            radius INTEGER, 
            position geography(point) 
         );


CREATE INDEX watchzones_geog_gix ON watchzones_geog USING GIST (position);

insert into watchzones_geog
SELECT generate_series(1,500000) AS id, now(),-29.21073,152.93322,'50',ST_GeogFromText('POINT(152.93322 -29.21073)');

 CREATE TABLE incident_geog (
    incident_id    SERIAL PRIMARY KEY,
    incident_name   VARCHAR(20),
    incident_span      GEOGRAPHY(POLYGON)
);

    CREATE INDEX incident_geog_gix ON incident_geog USING GIST (incident_span);

Insert into incident_geog values (1,'test', ST_GeogFromText
('POLYGON((152.945470916 -29.212227933,152.942130026 -29.213431145,152.939345911 -29.2125423759999,152.935144791 -29.21454003,152.933185494 -29.2135838469999,152.929481762 -29.216065516,152.929698621 -29.217402937,152.927245999 
-29.219576,152.921539 -29.217676,152.918487996 -29.2113786959999,152.919254355 -29.206029929,152.919692387 -29.2027824419999,152.936020197 -29.207567346,152.944901258 -29.207729953,152.945470916 
-29.212227933))'));

explain analyze SELECT i.incident_id, 
       wz.id 
FROM   watchzones_geog wz, 
       incident_geog i 
WHERE  St_dwithin(position, incident_span, radius); 

"Nested Loop  (cost=0.27..348717.00 rows=17 width=8) (actual time=0.277..18551.844 rows=500000 loops=1)"
"  ->  Seq Scan on watchzones_geog wz  (cost=0.00..11173.00 rows=500000 width=40) (actual time=0.102..50.052 rows=500000 loops=1)"
"  ->  Index Scan using incident_geog_gix on incident_geog i  (cost=0.27..0.67 rows=1 width=711) (actual time=0.036..0.036 rows=1 loops=500000)"
"        Index Cond: (incident_span && _st_expand(wz."position", (wz.radius)::double precision))"
"        Filter: ((wz."position" && _st_expand(incident_span, (wz.radius)::double precision)) AND _st_dwithin(wz."position", incident_span, (wz.radius)::double precision, true))"
"Planning time: 0.155 ms"
"Execution time: 18587.041 ms"

3. I have also tried creating a circle using ST_Buffer(position, radius,'quad_segs=8') and then using ST_Intersects. With this the query takes more than a minute with both geometry and geography data types.

Would be great if someone can suggest a better way or some optimisations which would speed up the execution.

Thanks

1
On a side note, you insert lat=-29.21073, long=152.93322 but then create the point as ST_GeomFromText('POINT(-29.21073 152.93322)', 4326);, effectively swapping lat and long (it should be POINT(long lat))JGH
In the first query you query for 50 degrees around the point, while in the 2nd you query for 50 meters. The outputs of the two are different so it is invalid to compare the time. In both case, try adding EXPLAIN (ANALYZE, BUFFERS) to find why it is slow.JGH
As you pointed out, that one is 50 degrees and the other is 50 metres, is there is a way to convert metres to degrees? I searched a lot but couldn't find anything relevant.Shuchi Sethi
You can check this answer about the degrees to meter issue (simply put, casting to Geography is the easiest. To help you on the query efficient, you still need to edit the question to include the explain(analyze,buffers) output.JGH
Hi @JGH, I have added output from explain analyzeShuchi Sethi

1 Answers

0
votes

The query is fine, but your sample is wrong. First, let's note that a query optimized for 1 polygon might not be the same as the optimized one for several thousands.

The main issue is with the sample points. As it is, you have 500,000 points at the exact same location, so depending on the intersecting polygon, the query will return 0 or 500 000 results. Postgis starts by using the index to intersects points/polygons using a square box, and then refine the results by computing the true distance. Using your sample, it has to compute the distance 500,000 times, which is slow.

Using a point layer with random locations (within 1 degree), the query takes less than 1 second as it has to compute the distance for 20 locations only.

INSERT INTO watchzones_geog
SELECT generate_series(1,500000) AS id, now(),0,0,'50',
       ST_makePoint(152.93322+random(),-29.21073+random())::geography;


explain analyze SELECT i.incident_id, 
       wz.id 
FROM   watchzones_geog wz, 
       incident_geog i 
WHERE  St_dwithin(position, incident_span, radius); 
    Nested Loop  (cost=0.00..272424.01 rows=1 width=8) (actual time=25.956..921.846 rows=20 loops=1)
--------------------------------------------
   Join Filter: ((wz."position" && _st_expand(i.incident_span, (wz.radius)::double precision)) AND (i.incident_span && _st_expand(wz."position", (wz.radius)::double precision)) AND _st_dwithin(wz."position", i.incident_span, (wz.radius)::double precision, true))
   Rows Removed by Join Filter: 499980
   ->  Seq Scan on incident_geog i  (cost=0.00..1.01 rows=1 width=36) (actual time=0.009..0.009 rows=1 loops=1)
   ->  Seq Scan on watchzones_geog wz  (cost=0.00..11173.00 rows=500000 width=40) (actual time=0.006..65.625 rows=500000 loops=1)
 Planning time: 1.887 ms
 Execution time: 921.895 ms