3
votes

I have about 1000 set of geographical coordinates (lat, long). Given one coordinate i want to find the closest one from that set. My approach was to measure the distance but on hundreds requests per second can be a little rough to the server doing all that math.

What is the best optimized solution for this?

Thanks

4
Your question is not clear. If you have "1000 set(s?)" which one is "that set" ?leaf
A set containing 1000 elements. One element is a lat long coordinate. Ex: [44.4239734, 26.1504752]keepwalking
Look at this library: github.com/darkskyapp/sphere-knnstdob--
By the way on Stack Overflow you are expected to show your attempts :-| This website seems to be more appropriate : cs.stackexchange.com.leaf
@procrastinator ok.keepwalking

4 Answers

3
votes
2
votes

You can use this library sphere-knn, or look at something like PostGIS.

1
votes

Why not select the potential closest points from the set (eg set a threshold, say, 0.1 and filter the set so that you have any points with +-0.1 in both axes from your target point). Then do that actual calcs on this set.

If none are within the first range, just enlarge it (0,2) and repeat (0.3, 0.4...) until you've got a match. Obviously you would tune the threshold so it best matched your likely results.

(I'm assuming the time-consulming bit is the actual distance calculation, so the idea is to limit the number of calculations.)

1
votes

An Algorithmic Response

Your approach is already O(n) in time. It's algorithmically very fast, and fairly simple to implement.

If that is not enough, you should consider taking a look at R-trees. The idea behind R-trees is roughly paraphrased as follows:

  1. You already have a set of n elements. You can preprocess this data to form rough 'squares' of regions each containing a set of points, with an established boundary.
  2. Now say a new element comes in. Instead of comparing across every coordinate, you identify which 'square' it belongs in by just comparing whether the point is smaller than the boundaries, and then measure the distance with only the points inside that square.

You can see at once the benefits:

  • You are no longer comparing against all coordinates, but instead only the boundaries (strictly less than the number of all elements) and then against the number of coordinates within your chosen boundary (also less than the number of all elements).
  • The upper bound of such an algorithm is O(n) time. The lower bound may, on average, be O(log n).

The main improvement is mostly in the pre-processing step (which is 'free' in that it's a one-time cost) and in the reduced number of comparisons needed.

A Systemic Response

Just buy another server, and distribute the requests and the elements using a load balancer such as Haproxy.

Servers are fairly cheap, especially if they are critical to your business, and if you want to be fast, it's an easy way to scale.