Getting the first k closest pairs of points from a set of n points in O(nlogn) time?

Question

Is it possible to find the k pairs of closest points in a set of n points faster than O(n^2)?

I know that I can calculate the closest pair of points in O(nlogn), but with that algorithm, not all of the distances are calculated, so I cannot return the top k closest pairs of points.

This problem is trivial if using the "Brute Force" method of calculating the distance of all the edges of points, but this has a complexity of [n * (n-1)]/2 and I would like to find something more efficient.

Edit: See the closest pairs algorithm here: https://www.geeksforgeeks.org/closest-pair-of-points-using-divide-and-conquer-algorithm/

Re: "I can calculate the closest pair of points in O(nlogn), but with that algorithm, not all of the distances are calculated, so I cannot return the top k closest pairs of points": Could you provide details about the algorithm you're referring to? (Or a link to the details?) — ruakh
(The obvious extension would seem to be to use a priority queue instead of a min candidate.) — greybeard

Dillon Davis Dillon Davis · Accepted Answer · 2019-03-04T21:22:13

One viable option for small k would be to use your O(nlogn) algorithm repeatedly on subsets of the original set of points, removing points as you go. More specifically, keep a set of points that formed a minimal pair. Every time you query for the next closest pair, we'll query for the next closest pair within these points and between each point and the rest of the original points, and take the closer of the two pairs.

To start, remove all but one of these points from the original set, and calculate the closest pair. Repeat this for each of the points in this "min set", and keep the overall closest pair. This will take O(j*nlogn) time to compute, when the "min set" is of size j.

Then, query for the next closest pair within this "min set" via a find-min (O(1) time) on a min-max heap of size k that we'll build as we add points to our "min set". Every time we add points to our "min set", we'll calculate the distance between each point in "min set" (size j) and these (up to) 2 new points, insert these into our min-max heap, and then delete as many elements as necessary to make the heap size k again (at most 2j elements) in O(jlogk) time.

Now, we take the closer pair of these two (deleting from the heap if relevant- O(logk) time), add the points to our "min set" and update the min-max heap as described, and repeat for the remaining k-j closest pairs. Overall, this will take O((k^2)nlogn + (k^2)logk + klogk) = O((k^2)(nlogn + logk)) = O((k^2)nlogn) time.

Getting the first k closest pairs of points from a set of n points in O(nlogn) time?

1 Answers