Armadillo vs Eigen3 Timing Difference

Question

My hope is that this discussion might help anyone else having issues with Armadillo and Eigen3.

I've written a wrapper class, Mat, that wraps either an arma::Mat from the armadillo library or an Eigen::Matrix from the Eigen3 library. This is controlled with a flag at compile time.

Additionally, I've written a Tensor class which uses a Mat as storage. The primary feature of this class is the use of Voigt notation to condense higher order tensors to be properly stored in a matrix.

Finally, I've written a test that multiplies a 2nd order tensor (i.e. a matrix) and a 1st order tensor (i.e. a vector) multiple times and records the time it takes to complete the operators. I do this with my Mat class and with my Tensor class.

Because Tensor wraps Mat, I would expect its time to be larger. This is the case with Armadillo, close to 20% on average. However, when using Eigen, using Tensor is faster, which makes absolutely no sense to me.

Does anything stick out to anyone?

EDIT: Providing more details.

I've first wrapped arma::Mat into myOwn::armaMat and Eigen::Matrix into myOwn::eigenMat. Both of these simply wrap armadillo and Eigen's API into a common framework. Finally, based on a compiler flag, myOwn::Mat wraps an armaMat or an eigenMat. I'm not sure about any optimization flags we have turned on.

As described above, myOwn::Tensor uses myOwn::Mat as a storage. Because of the physical applications I'll be using the Tensor class for, it is templated to be 2D (i.e. 2-by-2 if it's 2nd order) or 3D (i.e. 3-by-3). (In contrast, Mat can be of any size).

The operator I'm using for timing purposes is: a 2-by-2 matrix (2nd order tensor) times a 2-by-1 matrix (1st order tensor). When using only Mat, I'm essentially using armadillo's or Eigen's expression templating.

When using my Tensor class, I'm overloading operator* as such:

template< typename T1, bool Sym >
moris::Mat< T1 >
operator*(
        moris::Tensor< T1, 2, 2, true > const & aTensor1,
        moris::Tensor< T1, 1, 2, Sym >  const & aTensor2 )
{

    moris::Mat< T1 > tVector(2, 1);

    tVector(0) = aTensor1[0]*aTensor2[0] + aTensor1[2]*aTensor2[1];
    tVector(1) = aTensor1[2]*aTensor2[0] + aTensor1[1]*aTensor2[1];

    return tVector;
}

The [] operator on Tensor accesses data form the underlying storage Mat (via Voigt convention).

What are the sizes of the matrices? Do you you call Eigen/Armidillo's matrix-vector products in both the Mat and Tensor cases? Also, make sure to compile with compiler optimizations ON. Finally, if these remarks did not help spotting any issues, then it's hard to help more without seeing any code of what you are doing. — ggael
I've updated the question to answer some of the issues you've brought up. — Luis Negrete

Dirk Eddelbuettel Dirk Eddelbuettel · Accepted Answer · 2016-01-05T22:06:01

"It's complicated."

We offer bindings for both Armadillo and Eigen to R via the add-on packages RcppArmadillo and RcppEigen so the comparison and horse-race question comes up a lot.

And I don't think there is a clear answer. To make matters "worse", Armadillo generally refers to whichever LAPACK/BLAS you have installed and you could hence be using multicore parallelism, whereas Eigen tends to prefer its own routines. When preparing my Rcpp book I did some timings and found some counter-intuitive results.

At the end of the day you may just need to profile your problem.

Armadillo vs Eigen3 Timing Difference

1 Answers