1
votes

Say I have two sets of points X and Y possibly holding a different number of points, and of different dimensionality. We can assume that X and Y are n x m numpy arrays (n points, m dimensions each)

I would like to obtain the distribution (median and std) of sum(y-x) distances between the points in Y and X.

E.g. if one y point is (2,4) and one x point is (3,5) the sum(y-x) distance would be 2-3 + 4-5 = -2.

How can I do that in Python without looping?

1
If you want your arrays to be of different dimensionality, you should not have them both have shape n x m. It would be better to make that more clear. As explained down below, broadcasting is the default way of handling this kind of stuff without invoking loops. However, depending on the length of your vectors, the 3D array it creates may be larger than you would like. - eickenberg

1 Answers

2
votes

A quick browse through scipy.spatial.distance did not yield any results so you likely need to use broadcasting:

>>> a = np.random.rand(5,3) #(N x M)
>>> b = np.random.rand(4,3) #(K X M)
>>> dists = np.sum(a[:,None,:] - b, axis=-1)
>>> dists
array([[-0.57713957, -1.88996939, -0.13993727, -1.17222018],
       [ 0.89288677, -0.41994304,  1.33008907,  0.29780616],
       [ 0.45866859, -0.85416123,  0.89587088, -0.13641203],
       [ 1.12909228, -0.18373754,  1.56629457,  0.53401166],
       [ 0.64299673, -0.66983308,  1.08019903,  0.04791612]])

Now just grab the median and std:

>>> np.median(dists)
0.17286113728020264
>>> np.std(dists)
0.88228393506243197