This can be solved neatly with scipy.spatial.distance.pdist.
First, let's create an example array that stores points in 3D space:
import numpy as np
N = 10 # The number of points
points = np.random.rand(N, 3)
print(points)
Output:
array([[ 0.23087546, 0.56051787, 0.52412935],
[ 0.42379506, 0.19105237, 0.51566572],
[ 0.21961949, 0.14250733, 0.61098618],
[ 0.18798019, 0.39126363, 0.44501143],
[ 0.24576538, 0.08229354, 0.73466956],
[ 0.26736447, 0.78367342, 0.91844028],
[ 0.76650234, 0.40901879, 0.61249828],
[ 0.68905082, 0.45289896, 0.69096152],
[ 0.8358694 , 0.61297944, 0.51879837],
[ 0.80963247, 0.1680279 , 0.87744732]])
We compute for each point, the distance to all other points:
from scipy.spatial import distance
D = distance.squareform(distance.pdist(points))
print(np.round(D, 1)) # Rounding to fit the array on screen
Output:
array([[ 0. , 0.4, 0.4, 0.2, 0.5, 0.5, 0.6, 0.5, 0.6, 0.8],
[ 0.4, 0. , 0.2, 0.3, 0.3, 0.7, 0.4, 0.4, 0.6, 0.5],
[ 0.4, 0.2, 0. , 0.3, 0.1, 0.7, 0.6, 0.6, 0.8, 0.6],
[ 0.2, 0.3, 0.3, 0. , 0.4, 0.6, 0.6, 0.6, 0.7, 0.8],
[ 0.5, 0.3, 0.1, 0.4, 0. , 0.7, 0.6, 0.6, 0.8, 0.6],
[ 0.5, 0.7, 0.7, 0.6, 0.7, 0. , 0.7, 0.6, 0.7, 0.8],
[ 0.6, 0.4, 0.6, 0.6, 0.6, 0.7, 0. , 0.1, 0.2, 0.4],
[ 0.5, 0.4, 0.6, 0.6, 0.6, 0.6, 0.1, 0. , 0.3, 0.4],
[ 0.6, 0.6, 0.8, 0.7, 0.8, 0.7, 0.2, 0.3, 0. , 0.6],
[ 0.8, 0.5, 0.6, 0.8, 0.6, 0.8, 0.4, 0.4, 0.6, 0. ]])
You read this distance matrix like this: the distance between points 1 and 5 is distance[0, 4]
. You can also see that the distance between each point and itself is 0, for example distance[6, 6] == 0
We argsort
each row of the distance matrix to get for each point a list of which points are closest:
closest = np.argsort(D, axis=1)
print(closest)
Output:
[[0 3 1 2 5 7 4 6 8 9]
[1 2 4 3 7 0 6 9 8 5]
[2 4 1 3 0 7 6 9 5 8]
[3 0 2 1 4 7 6 5 8 9]
[4 2 1 3 0 7 9 6 5 8]
[5 0 7 3 6 2 8 4 1 9]
[6 7 8 9 1 0 3 2 4 5]
[7 6 8 9 1 0 3 2 4 5]
[8 6 7 9 1 0 3 5 2 4]
[9 6 7 1 8 4 2 0 3 5]]
Again, we see that each point is closest to itself. So, disregarding that, we can now select the k closest points:
k = 3 # For each point, find the 3 closest points
print(closest[:, 1:k+1])
Output:
[[3 1 2]
[2 4 3]
[4 1 3]
[0 2 1]
[2 1 3]
[0 7 3]
[7 8 9]
[6 8 9]
[6 7 9]
[6 7 1]]
For example, we see that for point 4, the k=3 closest points are 1, 3 and 2.
numpy.linalg.norm
andnumpy.argsort
might help. See stackoverflow.com/questions/1401712/… – dkato