0
votes

I have a graph where each node has coordinates in 2D (it's actually a geographic graph, with latitude and longitude.) I need to verify that if the distance between two edges is less than MAX_DIST then they share a node. Of course, if they intersect, then the distance between them is zero.

The brute force algorithm is trivial, is there a more efficient algorithm?

I was thinking of trying to adapt https://en.wikipedia.org/wiki/Closest_pair_of_points_problem to graph edges (and ignoring pairs of edges with a shared node), but it is not trivial to do so.

1
How about adding all edges to a Rtree index; afterwards you would only need to check the closest edges for MAX_DIST instead of all of them.Ionut Ticus
I don't know how an Rtree with all the edges would help me find the closest edgeslorg
After creating the index you can use functions like get_nearest_objects(bbox) which is a lot faster than manually checking all other edges; think of it like a regular index you use when working with a database. I can try to run some tests if you can provide some test data.Ionut Ticus

1 Answers

1
votes

I was curios to see how the rtree index idea would perform so I created a small script to test it using two really cool libraries for Python: Rtree and shapely
The snippet generates 1000 segments with 1 < length < 5 and coordinates in the [0, 100] interval, populates the index and then counts the pairs that are closer than MAX_DIST==0.1 (using the classic and the index-based method).
In my tests the index method was around 25x faster using the conditions above; this might vary greatly for your data set but the result is encouraging:

found 532 pairs of close segments using classic method
7.47 seconds for classic count
found 532 pairs of close segments using index method
0.28 seconds for index count

The performance and correctness of the index method depends on how your segments are distributed (how many are close, if you have very long segments, the parameters used).

import time
import random
from rtree import Rtree
from shapely.geometry import LineString


def generate_segments(number):
    segments = {}
    for i in range(number):
        while True:
            x1 = random.randint(0, 100)
            y1 = random.randint(0, 100)
            x2 = random.randint(0, 100)
            y2 = random.randint(0, 100)
            segment = LineString([(x1, y1), (x2, y2)])
            if 1 < segment.length < 5:  # only add relatively small segments
                segments[i] = segment
                break
    return segments


def populate_index(segments):
    idx = Rtree()
    for index, segment in segments.items():
        idx.add(index, segment.bounds)
    return idx


def count_close_segments(segments, max_distance):
    count = 0
    for i in range(len(segments)-1):
        s1 = segments[i]
        for j in range(i+1, len(segments)):
            s2 = segments[j]
            if s1.distance(s2) < max_distance:
                count += 1
    return count


def count_close_segments_index(segments, idx, max_distance):
    count = 0
    for index, segment in segments.items():
        close_indexes = idx.nearest(segment.bounds, 10)
        for close_index in close_indexes:
            if index >= close_index:  # do not count duplicates
                continue
            close_segment = segments[close_index]
            if segment.distance(close_segment) < max_distance:
                count += 1
    return count


if __name__ == "__main__":
    MAX_DIST = 0.1
    s = generate_segments(1000)
    r_idx = populate_index(s)
    t = time.time()
    print("found %d pairs of close segments using classic method" % count_close_segments(s, MAX_DIST))
    print("%.2f seconds for classic count" % (time.time() - t))
    t = time.time()
    print("found %d pairs of close segments using index method" % count_close_segments_index(s, r_idx, MAX_DIST))
    print("%.2f seconds for index count" % (time.time() - t))