Since you've implied you don't need the full square matrix of results by noting that cdist is awkward because it calculates pairwise distances twice, you could use Numba to write a UDF that only calculates for the lower or upper triangle of the square matrix.
Note that the first time this is run there is overhead from the JIT compilation.
from scipy.spatial import distance
import pandas as pd
from numba import njit, prange
import numpy as np
@njit(parallel=True)
def euclidean_distance(coords1, coords2):
# allocate output array
c1_length, c2_length = len(coords1), len(coords2)
out = np.empty(shape=(c1_length, c2_length), dtype=np.float64)
# fill the lower triangle with euclidean distance formula
# assuming coordiantes are (lat, lon) based on the example https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html
for lat_ix in prange(c1_length):
for lon_ix in prange(c2_length):
if lat_ix >= lon_ix: # do the reverse for the upper triangle
out[lat_ix, lon_ix] = (
(coords1[lat_ix, 0] - coords2[lon_ix, 0]) ** 2
+ (coords1[lat_ix, 1] - coords2[lon_ix, 1]) ** 2
) ** 0.5
else:
out[lat_ix, lon_ix] = 0
return out
for n in [10, 100, 5000, 20000]:
arr = np.random.normal(0, 100, (n, 2))
print(n, arr.shape)
%time out = euclidean_distance(arr, arr)
%time out_cdist = distance.cdist(arr, arr, 'euclidean')
if n < 1000:
np.testing.assert_array_almost_equal(out, np.tril(out_cdist))
print()
Output:
10 (10, 2)
CPU times: user 987 ms, sys: 19.3 ms, total: 1.01 s
Wall time: 1.01 s
CPU times: user 79 µs, sys: 12 µs, total: 91 µs
Wall time: 95.1 µs
100 (100, 2)
CPU times: user 1.05 ms, sys: 404 µs, total: 1.45 ms
Wall time: 1.16 ms
CPU times: user 926 µs, sys: 254 µs, total: 1.18 ms
Wall time: 946 µs
5000 (5000, 2)
CPU times: user 125 ms, sys: 128 ms, total: 253 ms
Wall time: 75 ms
CPU times: user 184 ms, sys: 92.6 ms, total: 277 ms
Wall time: 287 ms
20000 (20000, 2)
CPU times: user 2.21 s, sys: 2.15 s, total: 4.36 s
Wall time: 2.55 s
CPU times: user 3.1 s, sys: 2.71 s, total: 5.81 s
Wall time: 31.9 s
With a 20,000 element array, the UDF is quite a bit faster since it can save half of the computation. cdist
seems particularly/unexpectedly slow for this specific distribution of data at scale on my Macbook Air, but the point is made regardless.