5
votes

I have to find all closest pairs of points in a plane from a given set.

I've successfully implemented a naive algorithm O(n²) similar to this pseudocode function, where set is a list of all points on input, count is count of all points on input, which returns dist which is minimum distance found and result is list of all point-pairs with such distance.

function naive(Points[] set, int count)
    result = []
    distance = INF

    for i=0 to count-1:
        for j=i+1 to count:
           newDistance = distance(set[i], set[j])
           if newDistance < distance:
              result = []
              result.append({set[i], set[j]})
           else if newDistance == distance:
              result.append({set[i], set[j]})

   return (dist, result)

This solution works well, but is, thanks to high O(n²) complexity, very slow for larger inputs. I want to find a faster, optimised solution, so I've implemented an O(n logn) solution using recursive divide-and-conquer algorithm based on this article, which works well for most inputs, but since such approach does not iterate through all the points it fails on inputs like this one (order of pairs and order of points inside pairs does not matter):

Input:
{A[0,0], B[1,0], C[1,1] D[0,1]}

Current (wrong) output:
{A[0,0], D[0,1]}, {B[1,0], C[1,1]}

Desired output:
{A[0,0], B[0,1]}, {A[0,0], D[0,1]}, {C[1,1], B[1,0]}, {C[1,1], D[1,0]}

and also since it's recursive, stack overflows easily for larger inputs. What's a better way to solve this problem?

Thank you

3

3 Answers

1
votes

But you don't need to compare everything to everything else. For any given pair of points, imagine they lie on a rectangle's [the rectangle being aligned with the coordinate axes] diagonal with length d.

  • If the slope is positive:
    • any other points which lie to the left of the line x = d will be farther and should not be considered.
    • any other points which lie below the line y = d will be farther and should not be considered.
  • If the slope is negative:
    • any other points which lie to the right of the line x = d will be farther and should not be considered.
    • any other points which lie below the line y = d will be farther and should not be considered.

Similar constraints can be imagined to give you a bounding box in every case which should serve to eliminate a large percentage of points to consider in your inner loop unless you have a particularly dense constellation.

You definitely need quite a bit of dynamic programming here to give you reasonable runtimes for large sets.

0
votes

The best solution may differ depending on how points are distributed. One approach is to carve up the square into many smaller subsquares, chosen so that each subsquare has about 1 point on average. Next, associate points with the corresponding subsquare. Finally, for any given point, you only need to consider the nearest subsquares to find the closest neighbor (confirming nearesr status by checking all subsquares that could possibly contain a nearer point).

0
votes

Regarding the first question. I may be mistaken, but I can give you some hint.

Let's say you have 1000 points. You divide them into 4 groups by 250 points, and then you make 500 element sets out of all group pairs. Totally 6. In this case any pair of points will be covered by some half (pair of groups). And don't forget to keep all min values, not a single one on each iteration of recursion.

Also, because you have to do grouping each time you divide a set, you increase complexity here, so you might consider "slow" O(n²) algorithm for small sets, so the actual solution will be a combination of O(n log n) and O(n²).

For the second question, I can suggest only general way to avoid recursion. If you have enough memory, use dynamic programming, i.e. keep all previous results in memory and go through until you fill your array with all possible parameters. So instead of making recursive call you just take value from your array. And of course you have to start again until you have no empty values in array. The realization is different from task to task, so you have to think how to implement this one.