I am trying to find intersection between incidents(polygons) and watchzones(circles - points and radius) using PostGIS. The baseline data is going to be somewhere like more than 10 000 polygons and 500 000 circles. Also, I am quite new to PostGIS.
I have tried a few things but the execution is taking quite long. Can someone please suggest any optimisations or a better way using PostGIS only. Here is what I have tried -
1. Using Geometry datatype: I have stored the incidents and watchzones in type geometry. created GIST index on them, used ST_DWITHIN to find the intersection.
The output with 1 incident and 500 000 watchzones took like 6.750sec. Here, the time taken is optimum, but the problem is that I have radius in meters and with geometry type ST_DWithin requires it to be in SRID unit. I am unable to figure out this conversion.
CREATE TABLE incident (
incident_id SERIAL NOT NULL,
incident_name VARCHAR(20),
incident_span GEOMETRY(POLYGON, 4326),
CONSTRAINT incident_id PRIMARY KEY (incident_id)
);
CREATE TABLE watchzones (
id SERIAL NOT NULL,
date_created timestamp with time zone DEFAULT now(),
latitude NUMERIC(10, 7) DEFAULT NULL,
Longitude NUMERIC(10, 7) DEFAULT NULL,
radius integer,
position GEOMETRY(POINT, 4326),
CONSTRAINT id PRIMARY KEY (id)
);
CREATE INDEX ix_spatial_geom on watchzones using gist(position);
CREATE INDEX ix_spatial_geom_1 on incident using gist(incident_span);
Insert into incident values (
1,
'test',
ST_GeomFromText('POLYGON((152.945470916 -29.212227933,152.942130026 -29.213431145,152.939345911 -29.2125423759999,152.935144791 -29.21454003,152.933185494 -29.2135838469999,152.929481762 -29.216065516,152.929698621 -29.217402937,152.927245999
-29.219576,152.921539 -29.217676,152.918487996 -29.2113786959999,152.919254355 -29.206029929,152.919692387 -29.2027824419999,152.936020197 -29.207567346,152.944901258 -29.207729953,152.945470916
-29.212227933))',
4326
)
);
insert into watchzones
SELECT generate_series(1, 500000) AS id,
now(),
-29.21073,
152.93322,
'50',
ST_GeomFromText('POINT( 152.93322 -29.21073)', 4326);
explain analyze SELECT wz.id,
i.incident_id
FROM watchzones wz,
incident i
WHERE ST_DWithin(incident_span,position,wz.radius);
"Nested Loop (cost=0.14..227467.00 rows=42 width=8) (actual time=0.142..1506.476 rows=500000 loops=1)"
" -> Seq Scan on watchzones wz (cost=0.00..11173.00 rows=500000 width=40) (actual time=0.109..47.822 rows=500000 loops=1)"
" -> Index Scan using ix_spatial_geom_1 on incident i (cost=0.14..0.42 rows=1 width=284) (actual time=0.002..0.002 rows=1 loops=500000)"
" Index Cond: (incident_span && st_expand(wz."position", (wz.radius)::double precision))"
" Filter: ((wz."position" && st_expand(incident_span, (wz.radius)::double precision)) AND _st_dwithin(incident_span, wz."position", (wz.radius)::double precision))"
"Planning time: 0.150 ms"
"Execution time: 1523.312 ms"
2. Using Geography data type:
The output with 1 incident and 500 000 watchzones here, took like 29.987sec which is quite slow. Please note that I have tried this with both GIST and BRIN indexes and also ran VACUUM ANALYZE on the tables.
CREATE TABLE watchzones_geog
(
id SERIAL PRIMARY KEY,
date_created TIMESTAMP with time zone DEFAULT now(),
latitude NUMERIC(10, 7) DEFAULT NULL,
longitude NUMERIC(10, 7) DEFAULT NULL,
radius INTEGER,
position geography(point)
);
CREATE INDEX watchzones_geog_gix ON watchzones_geog USING GIST (position);
insert into watchzones_geog
SELECT generate_series(1,500000) AS id, now(),-29.21073,152.93322,'50',ST_GeogFromText('POINT(152.93322 -29.21073)');
CREATE TABLE incident_geog (
incident_id SERIAL PRIMARY KEY,
incident_name VARCHAR(20),
incident_span GEOGRAPHY(POLYGON)
);
CREATE INDEX incident_geog_gix ON incident_geog USING GIST (incident_span);
Insert into incident_geog values (1,'test', ST_GeogFromText
('POLYGON((152.945470916 -29.212227933,152.942130026 -29.213431145,152.939345911 -29.2125423759999,152.935144791 -29.21454003,152.933185494 -29.2135838469999,152.929481762 -29.216065516,152.929698621 -29.217402937,152.927245999
-29.219576,152.921539 -29.217676,152.918487996 -29.2113786959999,152.919254355 -29.206029929,152.919692387 -29.2027824419999,152.936020197 -29.207567346,152.944901258 -29.207729953,152.945470916
-29.212227933))'));
explain analyze SELECT i.incident_id,
wz.id
FROM watchzones_geog wz,
incident_geog i
WHERE St_dwithin(position, incident_span, radius);
"Nested Loop (cost=0.27..348717.00 rows=17 width=8) (actual time=0.277..18551.844 rows=500000 loops=1)"
" -> Seq Scan on watchzones_geog wz (cost=0.00..11173.00 rows=500000 width=40) (actual time=0.102..50.052 rows=500000 loops=1)"
" -> Index Scan using incident_geog_gix on incident_geog i (cost=0.27..0.67 rows=1 width=711) (actual time=0.036..0.036 rows=1 loops=500000)"
" Index Cond: (incident_span && _st_expand(wz."position", (wz.radius)::double precision))"
" Filter: ((wz."position" && _st_expand(incident_span, (wz.radius)::double precision)) AND _st_dwithin(wz."position", incident_span, (wz.radius)::double precision, true))"
"Planning time: 0.155 ms"
"Execution time: 18587.041 ms"
3. I have also tried creating a circle using ST_Buffer(position, radius,'quad_segs=8')
and then using ST_Intersects. With this the query takes more than a minute with both geometry and geography data types.
Would be great if someone can suggest a better way or some optimisations which would speed up the execution.
Thanks
lat=-29.21073, long=152.93322
but then create the point asST_GeomFromText('POINT(-29.21073 152.93322)', 4326);
, effectively swapping lat and long (it should bePOINT(long lat)
) – JGHEXPLAIN (ANALYZE, BUFFERS)
to find why it is slow. – JGH