0
votes

I am using postgis function ST_DWithin(geography gg1, geography gg2, double precision distance_meters), to find if a point is within a specified distance from a polygon. I am running tests to see how long does the query take and the explain shows that it is running sequential scan on tables instead of using BRIN or GIST indexes. Can someone suggest a way to optimise it.

Here are the tables -

table1(incident_geog) with polygons

CREATE TABLE public.incident_geog
(
    incident_id integer NOT NULL DEFAULT nextval('incident_geog_incident_id_seq'::regclass),
    incident_name character varying(20) COLLATE pg_catalog."default",
    incident_span geography(Polygon,4326),
    CONSTRAINT incident_geog_pkey PRIMARY KEY (incident_id)
)

CREATE INDEX incident_geog_gix
    ON public.incident_geog USING gist
    (incident_span)

table2 with points and distances(watchzones_geog)

CREATE TABLE public.watchzones_geog
(
    id integer NOT NULL DEFAULT nextval('watchzones_geog_id_seq'::regclass),
    date_created timestamp with time zone DEFAULT now(),
    latitude numeric(10,7) DEFAULT NULL::numeric,
    longitude numeric(10,7) DEFAULT NULL::numeric,
    radius integer,
    "position" geography(Point,4326),
    CONSTRAINT watchzones_geog_pkey PRIMARY KEY (id)
)

CREATE INDEX watchzones_geog_gix
    ON public.watchzones_geog USING gist
    ("position")

Sql with st_dwithin

explain select i.incident_id,wz.id from watchzones_geog wz, incident_geog i where ST_DWithin(position,incident_span,wz.radius * 1000);

Output of explain:

Nested Loop  (cost=0.26..418436.69 rows=1 width=8)
-> Seq Scan on watchzones_geog wz  (cost=0.00..13408.01 rows=600001 width=40)
 ->  Index Scan using incident_geog_gix on incident_geog i  (cost=0.26..0.67 rows=1 width=292)
        Index Cond: (incident_span && _st_expand(wz."position", ((wz.radius * 1000))::double precision))
        Filter: ((wz."position" && _st_expand(incident_span, ((wz.radius * 1000))::double precision)) AND _st_dwithin(wz."position", incident_span, ((wz.radius * 1000))::double precision, true))
2
you need to output the explain analyze not explain and the query has ST_DWithin(position,incident_span,50), but the query plan seems to show ST_DWithin(position,incident_span,(wz.radius * 1000)::double precision)Evan Carroll
There is seq scan because you don't filter anything in your query. You ask DB to compare all records from one table (so you have seq scan) with records 50 meters from them from other table (so there is index scan).Grzegorz Grabek

2 Answers

1
votes

Your SQL actually performs is that find some polygons within a specified distance for every point. The result one to one correspondence between incident_geog.incident_id and watchzones_geog.id. Because you operate on every point, so it use sequential scan.

I guess you want to start with Polygon to find point. So your SQL needs to change the table.

explain select i.incident_id,wz.id from incident_geog i, watchzones_geog wz where ST_DWithin(position,incident_span,50);

we can see:

Nested Loop  (cost=0.27..876.00 rows=1 width=16)
   ->  Seq Scan on incident_geog i  (cost=0.00..22.00 rows=1200 width=40)
   ->  Index Scan using watchzones_geog_gix on watchzones_geog wz  (cost=0.27..0.70 rows=1 width=40)
         Index Cond: ("position" && _st_expand(i.incident_span, '50'::double precision))
         Filter: ((i.incident_span && _st_expand("position", '50'::double precision)) AND _st_dwithin("position", i.incident_span, '50'::double precision, true))

Because you operate every order, there is always a table that traverses all the records by sequential scan. The results of these two SQL are not different. The key is in which table you start looking for the order of another table.

Maybe you can try Parallel Query. Don't use Parallel Query:

SET parallel_tuple_cost TO 0;
explain analyze select i.incident_id,wz.id from incident_geog i, watchzones_geog wz where ST_DWithin(position,incident_span,50);

Nested Loop  (cost=0.27..876.00 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=1)
   ->  Seq Scan on incident_geog i  (cost=0.00..22.00 rows=1200 width=40) (actual time=0.002..0.002 rows=0 loops=1)
   ->  Index Scan using watchzones_geog_gix on watchzones_geog wz  (cost=0.27..0.70 rows=1 width=40) (never executed)
         Index Cond: ("position" && _st_expand(i.incident_span, '50'::double precision))
         Filter: ((i.incident_span && _st_expand("position", '50'::double precision)) AND _st_dwithin("position", i.incident_span, '50'::double precision, true))
 Planning time: 0.125 ms
 Execution time: 0.028 ms

Try Parallel Query and set parallel_tuple_cost as 2:

SET parallel_tuple_cost TO 2;
explain analyze select i.incident_id,wz.id from incident_geog i, watchzones_geog wz where ST_DWithin(position,incident_span,50);

Nested Loop  (cost=0.27..876.00 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=1)
       ->  Seq Scan on incident_geog i  (cost=0.00..22.00 rows=1200 width=40) (actual time=0.001..0.001 rows=0 loops=1)
       ->  Index Scan using watchzones_geog_gix on watchzones_geog wz  (cost=0.27..0.70 rows=1 width=40) (never executed)
             Index Cond: ("position" && _st_expand(i.incident_span, '50'::double precision))
             Filter: ((i.incident_span && _st_expand("position", '50'::double precision)) AND _st_dwithin("position", i.incident_span, '50'::double precision, true))
     Planning time: 0.103 ms
     Execution time: 0.013 ms
0
votes

A few general points:

  1. Use IDENTITY COLUMNS rather than manually setting sequences.
  2. You don't need DEFAULT null:: the default is always null on nullable columns.
  3. Make sure you VACUUM ANALAYZE both tables after you load them.
  4. Don't use SQL-89, instead write out your INNER JOIN ... ON

    SELECT i.incident_id,wz.id
    FROM watchzones_geog wz
    INNER JOIN incident_geog i
      ON ST_DWithin(wz.position,i.incident_span,50);
    
  5. In your explain analyze you have a wz.radius * 1000 in your query you give the radius as 50. Which is it? Does the query seq scan if you statically type in the radius?

  6. If you're not using latitude and longitude on the table, drop those two columns. There is no reason to store them twice.
  7. I would not use varchar(20) instead just use text it's faster, because there is no length check, and it's implemented the same way.