Pairwise weighted distance vectorization

Question

The following efficient and vectorized Matlab code computes the Weighted Euclidean Distance between 2 sets of points A and B using a weight vector WTS (1 weight for each dimension; same weights for all points):

    WTS = sqrt(WTS); 

    % modify A and B against weight values
    A = WTS(ones(1,size(A,1)),:).*A;
    B = WTS(ones(1,size(B,1)),:).*B; 

    % calculate distance
    AA = sum(A.*A,2);  
    BB = sum(B.*B,2)'; 
    D = sqrt(AA(:,ones(1,size(B,1))) + BB(ones(1,size(A,1)),:) - 2*A*B');

(source: https://github.com/nolanbconaway/pairdist/blob/master/pairdist.m)

My question is: is there an efficient vectorized form (Matlab, R or Julia are fine) for a similar computation with the difference that WTS is a set of weight vectors with the same size as A? In other words, instead of 1 weight vector, I need 1 weight vector for each point in A.

This answer seems to do what I need, but it is in Python and I'm not sure about how to convert it to Matlab/R/Julia: https://stackoverflow.com/a/19285289/834518

Also, not a duplicate of Efficiently calculating weighted distance in MATLAB, as that question deals with the single weight vector case and I'm explicitly asking for the N weight vectors case.

EDIT: example aplications: RBF Networks and Gaussian Mixture Models, where you (can) have 1 weight vector for each neuron/component. An efficient solution to the problem is essential for those kinds of problems.

Have you tried any changes yourself that might get you closer to a solution? What did you find? — mbauman
@rahnema1 Not a duplicate, that's the case with just 1 weight vector. — rcpinto
@MattB. It is indeed simple for the 1 weight vector case, but I simply can't see how to do the same when there are as much weight vectors as points in A, at least with as much efficiency. And I tried some not-so-sciency ad-hoc modifications such as multiplying just A by WTS or including it in the 2AB term, obtaining catastrophic results. In other words, I tried a lot of things before asking. — rcpinto
@MattB. Probably would need to compute a 3D matrix with all pairwise differences (and not distances) and then I'd be able to multiply each one with it's corresponding weight vector (I think that's what he did in the Python answer), but again, I'm not sure about how to do that efficiently. — rcpinto
@DanGetz As I commented under Distance.jl solution it requires that A and B to be of the same size and W to be a vector but OP wants A and B to be of different sizes and W to be a matrix with the same size as A. Can you please show your code? — rahnema1

Chris Rackauckas Chris Rackauckas · Accepted Answer · 2018-01-03T20:07:34

In Julia you don't have to vectorize it to be efficient, just write the loop and it'll be faster than these vectorized forms anyways because it can fuse and get rid of the temporaries. Here's a pretty efficient implementation of pairwise applies in Julia that you can work from. It has all of the bells and whistles, but you can pair it down if you want.

Note that vectorization isn't necessarily "fast", it's just faster than looping in R/Python/MATLAB because it's only doing a single function call into an optimized kernel written in a lower level language (C/C++) which is actually looping. But putting together vectorized functions usually has a lot of temporary allocations since each vectorized functions return arrays. Thus if you really need efficiency, you should avoid vectorizing in general and write it in a language that allows for low cost function calls / loops. This post explains more about issues with vectorization in high level languages.

That answers one of the three questions you have. I don't have a good answer for MATLAB or R.

Pairwise weighted distance vectorization

4 Answers