I am writing a general purpose library using Eigen for computational mechanics,
dealing mostly with 6x6 sized matrices and 6x1 sized vectors.
I consider using the Eigen::Ref<> template to make it usable also for segments and blocks, as documented in http://eigen.tuxfamily.org/dox/TopicFunctionTakingEigenTypes.html and Correct usage of the Eigen::Ref<> class
However, a small performance comparison reveals that Eigen::Ref has a considerable overhead for such small functions compared to standard c++ references:
#include <ctime>
#include <iostream>
#include "Eigen/Core"
Eigen::Matrix<double, 6, 6> testRef(const Eigen::Ref<const Eigen::Matrix<double, 6, 6>>& A)
{
Eigen::Matrix<double, 6, 6> temp = (A * A) * A;
temp.diagonal().setOnes();
return temp;
}
Eigen::Matrix<double, 6, 6> testNoRef(const Eigen::Matrix<double, 6, 6>& A)
{
Eigen::Matrix<double, 6, 6> temp = (A * A) * A;
temp.diagonal().setOnes();
return temp;
}
int main(){
using namespace std;
int cycles = 10000000;
Eigen::Matrix<double, 6, 6> testMat;
testMat = Eigen::Matrix<double, 6, 6>::Ones();
clock_t begin = clock();
for(int i = 0; i < cycles; i++)
testMat = testRef(testMat);
clock_t end = clock();
double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
std::cout << "Ref: " << elapsed_secs << std::endl;
begin = clock();
for(int i = 0; i < cycles; i++)
testMat = testNoRef(testMat);
end = clock();
elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
std::cout << "noRef : " << elapsed_secs << std::endl;
return 0;
}
Output with gcc -O3:
Ref: 1.64066
noRef : 1.1281
So it seems that Eigen::Ref has considerable overhead, at least in cases with low actual computational effort.
On the other hand, the approach using const Eigen::Matrix<double, 6, 6>& A leads to unnecessary copies if blocks or segments are passed:
#include <Eigen/Core>
#include <iostream>
void test( const Eigen::Vector3d& a)
{
std::cout << "addr in function " << &a << std::endl;
}
int main () {
Eigen::Vector3d aa;
aa << 1,2,3;
std::cout << "addr outside function " << &aa << std::endl;
test ( aa ) ;
test ( aa.head(3) ) ;
return 0;
}
Output:
addr outside function 0x7fff85d75960
addr in function 0x7fff85d75960
addr in function 0x7fff85d75980
So this approach is excluded for the general case.
Alternatively, one could make function templates using Eigen::MatrixBase, as described in the documentation. However, this seems to be inefficient for large libraries, and it cannot be adapted to fixed size matrices (6x6, 6x1) as in my case.
Is there any other alternative? What is the general recommendation for large general purpose libraries?
Thank you in advance!
edit: Modified the first benchmark example according to the recommendations in the comments
Ref: 0.069noRef : 0.069if performance without optimizations is important then yeah Eigen in general will have a huge overhead, but that mostly disappears with optimizations - PeterT-O2)? If not, your results are not trustable but if... Your test functions have no side-effect. This bears the danger thatEigen::Matrix<double, 6, 6> temp = (A * A);is optimized away. You should return the values, store them e.g. in avectorand print them (after measuring) to prevent their "vanashing" due to optimization. A look into assembly could help also to uncover what of your code actually reaches the binary. - Scheff's Cat