search the closest points for a given point in 1 million points

Question

This is an algorithm question.

Given 1 million points , each of them has x and y coordinates, which are floating point numbers.

Find the 10 closest points for the given point as fast as possible.

The closeness can be measured as Euclidean distance on a plane or other kind of distance on a globe. I prefer binary search due to the large number of points.

My idea:

   save the points in a database

1. Amplify x by a large integer e.g. 10^4 and cut off the decimal part and then Amplify x integer part by 10^4 again. 

2. Amplify y by a large integer e.g. 10^4 

3. Sum the above result from step 1 and 2 , we call the sum as associate_value

4. Repeat 1 to 3 for each number in the database

E.g.

    x = 12.3456789 , y = 98.7654321
    x times 10^4 = 123456 and then times 10^4 to get 1234560000
    y times 10^2 =  9876.54321 and then get 9876
    Sum them, get 1234560000 + 9876 = 1234569876

In this way, I transform 2-d data to 1-d data. In the database, each point is associated with an integer (associate_value). The integer column can be set as index in the database for fast search.

For a given point (x, y), I perform step 1 - 3 for it and then find the points in the database such that their associate_value is close to the given point associate_value.

e.g.

    x = 59.469797 , y = 96.4976416
   their associated value is 5946979649

Then in the database, I search the associate_values that are close to 5946979649, for example, 5946979649 + 50 , 5946979649 - 50 and also 5946979649 + 50000000 , 5946979649 - 500000000. This can be done by index-search in database.

In this way, I can find a group of points that are close to the given point. I can reduce the search space greatly. Then, I can use Euclidean or other distance formula to find the closest points.

I am not sure the efficiency of the algorithm, especially, the process of generating associate_values.

My idea works or not ? Any better ideas ?

Thanks

It seems like you have to go through each point multiple times, might as well just calculate the distance between the specified point and each other point once. And just keep the top 10 points in a list. — Moop
Are they always going to be the same million points? Or will they be random each time you want to search? — Moop
I would look into R-Trees if you are interested: en.wikipedia.org/wiki/R-tree — Moop
The quickest way is just to have each point object reference its nearest 10 other points. Obviously that will take a lot of memory (10X) and probably take a while to generate at first, but the search times would then be O(1) — Moop

Jared Windover-Kroes Jared Windover-Kroes · Accepted Answer · 2014-07-25T15:59:00

Your idea seems like it may work, but I would be concerned with degenerate cases (like if no points are in your specified ranges, but maybe that's not possible given the constraints). Either way, since you asked for other ideas, here's my stab at it: Store all of your points in a quad tree. Then just walk down the quad tree until you have a sufficiently small group to search through. Since the points are fixed, the cost of creating the quad is constant, and this should be logarithmic in the number of points you have.

search the closest points for a given point in 1 million points

3 Answers