11
votes

I have one set (X) of points (not very big let's say 1-20 points) and the second (Y), much larger set of points. I need to choose some point from Y which sum of distances to all points from X is minimal.

I came up with an idea that I would treat X as a vertices of a polygon and find centroid of this polygon, and then I will choose a point from Y nearest to the centroid. But I'm not sure whether centroid minimizes sum of its distances to the vertices of polygon, so I'm not sure whether this is a good way? Is there any algorithm for solving this problem?

Points are defined by geographical coordinates.

4
Do you mean latitude-longitude on a curved surface, or x-y on a plane? - David Thornley
Centroid doesn't minimize the sum of distances to vertices. For example, in case of a triangle Torricelli point (en.wikipedia.org/wiki/Torricelli_point) is optimal. - adamax

4 Answers

4
votes

Centroid of the polygon might not be right, but such a point exists.

In the paper: n-ellipses and the minimum distance problem, it is shown that if the points (called foci, your set of X) are not collinear then

  • There is a unique point (called center) for which the sum of distances are minimized. This point is such that the sum of unit vectors from that point to the foci is zero!

  • The locus of points for which the sum of distances is constant is a convex curve (called an n-ellipse) containing the center

  • The n-ellipse for distance D completely contains the n-ellipse for any other distance D' for which D' < D.

Thus you can do some type of hill climbing algorithm to find the center.

Of course these n-ellipses are not necessarily circles, so just picking the point closest to the center might not work, but might be a good approximation.

You can perhaps do some preprocessing on the 20 points (if those are fixed) to figure out a good partitioning scheme (based on the above information).

Hope that helps.

1
votes

If you want to minimize the sum of the squares of the distances (not the sum of the distances), then the point that minimizes that sum is the average of the points in X.

Proof:

sum(squares of distances) = (x-x0)^2 + (y-y0)^2 + (x-x1)^2 + (y-y1)^2 + ... 

d/dx sum(squares of distances) = 2(x-x0) + 2(x-x1) + ... = 2(Nx - x0 - x1 - ...)

the sum is minimized when the derivative is zero, which occurs when Nx = x0+x1+..., so x = (x0+x1+...)/N

The derivative is symmetric around this point, and the function is quadratic, so I'm pretty sure the closest point in Y to this average point is the best.

Minimizing the distances is harder, but I suspect the same algorithm, with more leeway in the set of Ys that you test, would work also.

1
votes

Because you want the minimal sum of distances I believe that you can reduce the set of points X to its spatial mean. Then you can use a KDTree or some sort of spatial partitioning tree to find the point in Y closest to the spatial mean of X. Using a spatial partitioning tree can save a good bit of work compared to checking all the possible points.

0
votes

Excuse me for suggesting brute force. The way the question is posed we do not know where X,Y lie. Suppose X is 30 points, Y is 1000 points. Then for each point of Y sum 30 distances. Altogether 30000 calculations, done in a jiffy. This guarantees a minimum. Finding some "center" of X and choosing the closest Y will be an approximate solution only.

The more interesting question is to find such a point for X alone. ignore Y. For X three points only, the Fermat-Torichelli point solves the problem.