1
votes

Q) Find the closest pair of points in a given set of points.

Expected time complexity: O(nlog2(n)) / O(nlog(n))

i was reading up this problem from here. However i am unable to understand a few things:

  1. what do we achieve by sorting points by Y which are already sorted by X?

  2. when we take a point as mid from one of the halves, dont we miss out on a few cases? i.e. how does taking a mid point ensure that the strip formed includes all the points that are candidates for a distance less than the distance returned by the two recursive calls?

  3. even if the above point is proved, shouldnt we take the mid point of both halves i.e. m of left half and (m+1) of right half?

2

2 Answers

2
votes

2) You use the x coordinate of the mid point to find a line that splits the points in half. The recursive calls compare all pairs that don't cross that line, so all that remains is to compare pairs that do cross the line. If two points are less than d apart, and they're on either side of the line, then they must both be less than d away from the line.

3) Well, that will be done in the recursive calls. I don't know what else you could mean.

1) The points in the strip are separated out and sorted. Then, given one point in the strip, all the points with y coordinate within d are in a contiguous region around it. These are the only other points you need to compare it with, and the sorting makes it quick and easy to find these points. The magic of this algorithm is that the number of possible points in any such region is strictly limited to a constant value. This follows from the fact (established by the recursive calls) that no pair of points on either side of the line are closer than d.

1
votes

If I understand this correctly, the following point is actually a bit misleading:

3) Recursively find the smallest distances in both subarrays. [...]

I think it should say: Split the subarrays recursively until they contain a small number of points (maybe 2 or 3? It doesn't say) and then find the smallest distance in each of this small strips. Then we proceed with 4)-7) and apply a y-strip search for every border between all subarrays that we created.

Here is my understanding of what happens, it doesn't mention array splitting because I think this only confuses what is going on:

1) Sort all points by x. Walk over the sorted array and calculate the distance for each consecutive pair of points, ie. each point and the one that follows it. We get a minimum distance 'd'.

2) Sort all point by y. Again, for each point we calculate the distance to its direct following neighbor(s) in the sorted array. Unlike '1)', we do not only consider the direct neighbors, but all neighbors that are less than 'd' ahead of the current point. When we find a pair with distance smaller 'd', we use this new distance as our new 'd' from this moment on.

This should give us the smallest d.

Basically, splitting an array recursively until only pairs remain (which is what I guess they imply) is in effect the same as simply walking over the array and comparing each consecutive point pair. There is one difference in my step 1) though: they only calculate the x-distance of every 2nd point to its successor (there is a subarray border after every 2nd point). This may help (because they are doing only half the work) or be detrimental (because they may no get the best d from this first pass).

I haven't verified this, please let me know if I understood this wrong or if there are problems with my 'translated' approach.