2
votes

I am trying to connect each record from table 1 (10k records) with its nearest neighbor in table (20k records). I am hoping there is a solution that does not require me to iterate over table 1 and perform a KNN for each record.

I have tried to do a KNN index limiting by reasonable distance, and selecting distinct on table 1 id from that as a sub-query, but this requires me to order by the id number rather than the distance.

Any advice is greatly appreciated.

select distinct on (t1id) t1id, it2d, dist, name, street
from (
    select t1id, it2d, dist, name, street
    from (
        select t1id, it2d, st_distance(t1geom, t2geom) as dist, name, street
        from (
            select t1.id as t1id,t1.geom as t1geom, t2.id as t2id, t2.geom as t2geom, t1.name, t2.street
                from t1
            join t2
                on st_dwithin(t1.geom, t2.geom, 300)
            where t1.seg is null 
        ) as near
        order by t1geom <-> t2geom  
    ) as distOrdered
    order by dist 
) as idOrdered
order by t1id 
1

1 Answers

0
votes

I found a method that seems to be working, but it is a bit ugly. I query the results of the 1st query to get the id number and the shortest distance and then join that back to the original query to get the shortest distance for each record.

This is far from ideal, but I believe is returning the correct results.

select t1id, it2d, dist, name, street
from (
    select t1id, it2d, dist, name, street
    from (
        select t1id, it2d, dist, name, street
        from (
            select t1id, it2d, st_distance(t1geom, t2geom) as dist, name, street
            from (
                select t1.id as t1id,t1.geom as t1geom, t2.id as t2id, t2.geom as t2geom, t1.name, t2.street
                    from t1
                join t2
                    on st_dwithin(t1.geom, t2.geom, 300)
                where t1.seg is null 
            ) as near
            order by t1geom <-> t2geom  
        ) as distOrdered
        order by dist 
    ) as idOrdered
) as allD
join (
    select distinct on (t1id) t1id, min(dist) as md
        from (
            select t1id, it2d, dist, name, street
            from (
                select t1id, it2d, st_distance(t1geom, t2geom) as dist, name, street
                from (
                    select t1.id as t1id,t1.geom as t1geom, t2.id as t2id, t2.geom as t2geom, t1.name, t2.street
                        from t1
                    join t2
                        on st_dwithin(t1.geom, t2.geom, 300)
                    where t1.seg is null 
                ) as near
                order by t1geom <-> t2geom  
            ) as distOrdered
            order by dist 
        ) as idOrdered
        group by t1id order by t1id
    ) as short
on allD.t1id = short.t1id and allD.dist = short.md

There is actually a much simpler way of doing this by using cross join lateral which is effectively doing a nearest neighbor search for each row.

select 
t1.id, t1.id, dist
from t1
cross join lateral 
(
    select st_distance(t1.geom,  t2.geom) dist, t2.id
    from t2
    where t1.seg is null 
    order by t1.geom <->  t2.geom
    limit 1
) as b
where b.dist < 300