In my project I use Eigen3.3 library to do calculations with 6x6 matrices. I decided to investigate whether AVX instructions really give me any speedup over SSE. My CPU does support both sets:
model name : Intel(R) Xeon(R) CPU E5-1607 v2 @ 3.00GHz
flags : ... sse sse2 ... ssse3 ... sse4_1 sse4_2 ... avx ...
So, I compile a small test shown below with gcc4.8 using two different sets of flags:
$ g++ test-eigen.cxx -o test-eigen -march=native -O2 -mavx
$ g++ test-eigen.cxx -o test-eigen -march=native -O2 -mno-avx
I confirmed that the second case with -mno-avx
did not produce any instructions with ymm
registers. Nevertheless, the two cases give me very similar results of about 520ms as measured with perf
.
Here is the program test-eigen.cxx (it does an inverse of the sum of two matrices just to be close to the actual task I am working on):
#define NDEBUG
#include <iostream>
#include "Eigen/Dense"
using namespace Eigen;
int main()
{
typedef Matrix<float, 6, 6> MyMatrix_t;
MyMatrix_t A = MyMatrix_t::Random();
MyMatrix_t B = MyMatrix_t::Random();
MyMatrix_t C = MyMatrix_t::Zero();
MyMatrix_t D = MyMatrix_t::Zero();
MyMatrix_t E = MyMatrix_t::Constant(0.001);
// Make A and B symmetric positive definite matrices
A.diagonal() = A.diagonal().cwiseAbs();
A.noalias() = MyMatrix_t(A.triangularView<Lower>()) * MyMatrix_t(A.triangularView<Lower>()).transpose();
B.diagonal() = B.diagonal().cwiseAbs();
B.noalias() = MyMatrix_t(B.triangularView<Lower>()) * MyMatrix_t(B.triangularView<Lower>()).transpose();
for (int i = 0; i < 1000000; i++)
{
// Calculate C = (A + B)^-1
C = (A + B).llt().solve(MyMatrix_t::Identity());
D += C;
// Somehow modify A and B so they remain symmetric
A += B;
B += E;
}
std::cout << D << "\n";
return 0;
}
Should I really expect better performance with AVX in Eigen? Or am I missing something in the compiler flags or in the eigen configuration? It is possible that my test is not suitable to demonstrate the difference but I don't see what might be wrong with it.