Fastest way to Calculate the Euclidian distance between 2 sets of vectors using numpy or scipy

Question

OK I have recently discovered that the the scipy.spatial.distance.cdist command is very quick for solving a COMPLETE distance matrix between two vector arrays for source and destination. see: How can the euclidean distance be calculated with numpy? I wanted to try to duplicate those performance gains when solving the distance between two equal sized arrays. The distance between two SINGLE vectors is rather straight forward to calculate as shown in the previous link. We can take vectors:

    import numpy as np
    A=np.random.normal(size=(3))
    B=np.random.normal(size=(3))

and then use ´numpy.linalg.norm´ where

    np.linalg.norm(A-B)

is equivalent to

    temp = A-B
    np.sqrt(temp[0]**2+temp[1]**2+temp[2]**2)

which works nicely however when I want to know the distance between two sets of vectors where my_distance = distance_between( A[i], B[i] ) for all i the second solution works perfectly. In that as expected:

    A=np.random.normal(size=(3,42))
    B=np.random.normal(size=(3,42))     
    temp = A-B
    np.sqrt(temp[0]**2+temp[1]**2+temp[2]**2)

gives me a set of 42 distances between the ith element of A to the ith element of B. Whereas the norm function correctly calculates the norm for the entire matrix giving me a single value that is not what I'm looking for. The behaviour with the 42 distances is what I want to maintain, hopefully with nearly as much speed as I get from cdist for solving complete matrices. So the question is whats the most efficient way using python and numpy/scipy to calculate i distances between data with shape (n,i)?

Thanks, Sloan

Rolf Bartstra Rolf Bartstra · Accepted Answer · 2012-12-10T18:06:42

I think you already cracked most of the case yourself. Instead of your last line, however, I would use:

np.sqrt(np.sum(temp**2,0))

Fastest way to Calculate the Euclidian distance between 2 sets of vectors using numpy or scipy

2 Answers