1
votes

I have 2.2M line geometries (roads) in one table and 1500 line geometries (coast lines) in another. Both tables have a spatial index. I need to find the endpoints of roads which are within a certain distance from the coast and store the point geometry along with the distance. Current solution, which seems ineffecient, and takes many many hours to complete on a very fast machine;

CREATE TEMP TABLE with start and end points of the road geometries within distance, using ST_STARTPOINT, ST_ENDPOINT and ST_DWITHIN. CREATE SPATIAL INDEXES for both geometry columns in the temp table.

Do two INSERT INTO operations, one for startpoints and one for endpoints; SELECT geometry and distance, using ST_DISTANCE from point to coastline and a WHERE ST_DWITHIN to only consider points within the chosen distance.

Code looks something along these lines:

create temp table roadpoints_temp as select st_startpoint(road.geom) as geomstart, st_endpoint(road.geom) as geomend from 
    coastline_table coast, roadline_table road where st_dwithin(road.geom, coast.geom, 100);

create index on roadpoints_temp (geomstart);

create index on roadpoints_temp (geomend);

create table roadcoast_points as select roadpoints_temp.geomstart as geom, round(cast(st_distance(roadpoints_temp.geomstart,kyst.geom) as numeric),2) as dist 
    from roadpoints_temp, coastline_table coast where st_dwithin(roadpoints_temp.geomstart, coast.geom, 100);

insert into roadcoast_points select roadpoints_temp.geomend as geom, round(cast(st_distance(roadpoints_temp.geomend,kyst.geom) as numeric),2) as dist 
    from roadpoints_temp, coastline_table coast where st_dwithin(roadpoints_temp.geomend, coast.geom, 100);

drop table roadpoints_temp;

All comments and suggestions welcome :-)

1
Try adding USING gist for the indexes? postgresql.org/docs/9.3/interactive/sql-createindex.htmljoop
I tried your code with some test data but had no performance problems. Do you have very complex geometries? (with many vertices). Maybe can you provide a sample from your data?Tom-db
I have tried explicitly using gist, but this did not affect performance - the creation of the temp table seems to take up the bulk of the processing time.BobJohnson
The longest coast line is 1000+ km and is made up of 190.000 vertices. I will look into st_segmentize to see if that has an effect on the speed.BobJohnson

1 Answers

0
votes

You need to effectively utilize your indexes. It seems that fastest plan would be to find for each coast all the roads that are within distance of it. Doing two rechecks separately means you lose connection of closest coastline to the road and need to re-find this pair again and again.

You need to check your execution plan using EXPLAIN to have a Seq Scan on coastline table and GiST index scan on road table.

select road.* 
from coastline_table coast, roadline_table road
where 
    ST_DWithin(coast.geom, road.geom, 100) -- indexed query    
    and -- non-indexed recheck
    (
        ST_DWithin(ST_StartPoint(road.geom), coast.geom, 100)
        or ST_DWithin(ST_EndPoint(road.geom), coast.geom, 100)
    );