I have two numpy arrays, X
and Y
, with shapes (n,d)
and (m,d)
, respectively. Assume that we want to compute the Euclidean distances between each row of X
and each row of Y
and store the result in array Z
with shape (n,m)
. I have two implementations for this. The first implementation uses two for loops as follows:
for i in range(n):
for j in range(m):
Z[i,j] = np.sqrt(np.sum(np.square(X[i] - Y[j])))
The second implementation uses only one loop by vectorization:
for i in range(n):
Z[i] = np.sqrt(np.sum(np.square(X[i]-Y), axis=1))
When I run these codes on a particular X
and Y
data, the first implementation takes nearly 30 seconds while the second implementation takes nearly 60 seconds. I expect the second implementation to be faster since it uses vectorization. What is the reason of its slow running? I know that we can obtain faster implementations by fully vectorizing the code, but I don't understand why the second code (which is partially vectorized) is slower than non-vectorized version.
Here is the complete code:
n,m,d = 5000,500,3000
X = np.random.rand(n,d)
Y = np.random.rand(m,d)
Z = np.zeros((n,m))
tic = time.time()
for i in range(n):
for j in range(m):
Z[i,j] = np.sqrt(np.sum(np.square(X[i] - Y[j])))
print('Elapsed time 1: ', time.time()-tic)
tic = time.time()
for i in range(n):
Z[i] = np.sqrt(np.sum(np.square(X[i]-Y), axis=1))
print('Elapsed time 2: ', time.time()-tic)
tic = time.time()
train_squared = np.square(X).sum(axis=1).reshape((1,n))
test_squared = np.square(Y).sum(axis=1).reshape((m,1))
test_train = -2*np.matmul(Y, X.T)
dists = np.sqrt(test_train + train_squared + test_squared)
print('Elapsed time 3: ', time.time()-tic)
And this is the output:
Elapsed time 1: 35.659096002578735
Elapsed time 2: 65.57051086425781
Elapsed time 3: 0.3912069797515869
n,m,d
in your testing? – Divakarn=5000, m=500, d=3000
– Hossein(d,)
; the second version uses temporaries of shape(m, d)
, much bigger. I don't know what your fully vectorized solution looks like, so I don't know what size of temporaries that's producing. – user2357112 supports Monica