3
votes

In matlab/octave pairwise distances between matrices as required for e.g. k-means are calculated by one function call (see cvKmeans.m), to distFunc(Codebook, X) with as arguments two matrices of dimensions KxD.

In Eigen this can be done for a matrix and one vector by using broadcasting, as explained on eigen.tuxfamily.org:

 (m.colwise() - v).colwise().squaredNorm().minCoeff(&index);

However, in this case v is not just a vector, but a matrix. What's the equivalent oneliner in Eigen to calculate such pairwise (Euclidean) distances across all entries between two matrices?

2

2 Answers

1
votes

I think the appropriate solution is to abstract this functionality into a function. That function may well be templated; and it may well use a loop - the loop will be really short, after all. Many matrix operations are implemented using loops - that's not a problem.

For example, given your example of...

MatrixXd p0(2, 4);
p0 <<
    1, 23, 6, 9,
    3, 11, 7, 2;

MatrixXd p1(2, 2);
p1 <<
    2, 20,
    3, 10;

then we can construct a matrix D such that D(i,j) = |p0(i) - p1(j)|2

MatrixXd D(p0.cols(), p0.rows());
for (int i = 0; i < p1.cols(); i++)
    D.col(i) = (p0.colwise() - p1.col(i)).colwise().squaredNorm().transpose();

I think this is fine - we can use some broadcasting to avoid 2 levels of nesting: we iterate over p1's points, but not over p0's points, nor over their dimensions.

However, you can make a oneliner if you observe that |p0(i) - p1(j)|2 = |p0(i)|2 + |p1(j)|2 - 2 p0(i)Tp1(j). In particular, the last component is just matrix multiplication, so D = -2 p0Tp1 + ...

The blank left to be filled is composed of a component that only depends on the row; and a component that only depends on the column: these can be expressed using rowwise and columnwise operations.

The final "oneliner" is then:

D = ( (p0.transpose() * p1 * -2
      ).colwise() + p0.colwise().squaredNorm().transpose()
    ).rowwise() + p1.colwise().squaredNorm();

You could also replace the rowwise/colwise trickery with an (outer) product with a 1 vector.

Both methods result in the following (squared) distances:

  1 410
505  10
 32 205
 50 185

You'd have to benchmark which is fastest, but I wouldn't be surprised to see the loop win, and I expect that's more readable too.

0
votes

Eigen is more of a headache than I thought on first sight.

  1. There is no reshape() functionality for example (and conservativeResize is something else).
  2. It also seems (I'd like to be corrected) to be the case that Map does not just offer a view on the data, but assignments to temporary variables seem to be required.
  3. The minCoeff function after the colwise operator cannot return a minimum element and an index to that element.
  4. It is unclear to me if replicate is actually allocating duplicates of the data. The reason behind broadcasting is that this is not required.

    matrix_t data(2,4);
    matrix_t means(2,2);
    
    // data points
    data << 1, 23, 6, 9,
            3, 11, 7, 2;
    
    // means
    means << 2, 20,
             3, 10;
    
    std::cout << "Data: " << std::endl;
    std::cout << data.replicate(2,1) << std::endl;
    
    column_vector_t temp1(4);
    temp1 = Eigen::Map<column_vector_t>(means.data(),4);
    
    std::cout << "Means: " << std::endl;
    std::cout << temp1.replicate(1,4) << std::endl;
    
    matrix_t temp2(4,4);
    temp2 = (data.replicate(2,1) - temp1.replicate(1,4));
    std::cout << "Differences: " << std::endl;
    std::cout << temp2 << std::endl; 
    
    matrix_t temp3(2,8);
    temp3 = Eigen::Map<matrix_t>(temp2.data(),2,8);
    std::cout << "Remap to 2xF: " << std::endl;
    std::cout << temp3 << std::endl;
    
    matrix_t temp4(1,8);
    temp4 = temp3.colwise().squaredNorm();
    std::cout << "Squared norm: " << std::endl;
    std::cout << temp4 << std::endl;//.minCoeff(&index);
    
    matrix_t temp5(2,4);
    temp5 = Eigen::Map<matrix_t>(temp4.data(),2,4);
    std::cout << "Squared norm result, the distances: " << std::endl;
    std::cout << temp5.transpose() << std::endl;
    
    //matrix_t::Index x, y;
    std::cout << "Cannot get the indices: " << std::endl;
    std::cout << temp5.transpose().colwise().minCoeff() << std::endl; // .minCoeff(&x,&y);
    

This is not a nice oneliner and seems overkill just to compare every column in data with every column in means and return a matrix with their differences. However, the versatility of Eigen does not seem to be such that this can be written down much shorter.