i have numpy array in python which contains lots (10k+) of 3D vertex points (vectors with coordinates [x,y,z]). I need to calculate distance between all possible pairs of these points.
it's easy to do using scipy:
import scipy
D = spdist.cdist(verts, verts)
but i can't use this because of project policy on introducing new dependencies.
So i came up with this naive code:
def vert_dist(self, A, B):
return ((B[0]-A[0])**2+(B[1]-A[1])**2+(B[2]-A[2])**2)**(1.0/2)
# Pairwise distance between verts
#Use SciPy, otherwise use fallback
try:
import scipy.spatial.distance as spdist
D = spdist.cdist(verts, verts)
except ImportError:
#FIXME: This is VERY SLOW:
D = np.empty((len(verts), len(verts)), dtype=np.float64)
for i,v in enumerate(verts):
#self.app.setStatus(_("Calculating distance %d of %d (SciPy not installed => using SLOW AF fallback method)"%(i,len(verts))), True)
for j in range(i,len(verts)):
D[j][i] = D[i][j] = self.vert_dist(v,verts[j])
vert_dist() calculates 3D distance between two vertices and rest of code just iterates over vertices in 1D array and for each one it calculates distance to every other in the same array and produces 2D array of distances.
But that's extremely slow (1000 times) when compared to scipy's native C code. I wonder if i can speed it up using pure numpy. at least to some extent.
Some more info: https://github.com/scipy/scipy/issues/9172
BTW i've tried PyPy JIT compiler and it was even slower (10 times) than pure python.
UPDATE: I was able to speed things up a little like this:
def vert_dist_matrix(self, verts):
#FIXME: This is VERY SLOW:
D = np.empty((len(verts), len(verts)), dtype=np.float64)
for i,v in enumerate(verts):
D[i] = D[:,i] = np.sqrt(np.sum(np.square(verts-verts[i]), axis=1))
return D
This eliminates the inner loop by computing whole row at once, which makes stuff quite faster, but still noticeably slower than scipy. So i still look at @Divakar's solution
((B[0]-A[0])**2+(B[1]-A[1])**2+(B[2]-A[2])**2)**(1.0/2)
calculation rather than just subtraction. – Harvie.CZvert_dist()
function, which is fast enough. When i replace((B[0]-A[0])**2+(B[1]-A[1])**2+(B[2]-A[2])**2)**(1.0/2)
withnp.sqrt(np.sum(np.square(B-A)))
it's bit more readable, but it still does not eliminate the 2D for loops which are the actual slow part of this code. – Harvie.CZ