I was trying to do linear algebra numerical computation in C++. I used Python Numpy for quick model and I would like to find a C++ linear algebra pack for some further speed up. Eigen seems to be quite a good point to start.
I wrote a small performance test using large dense matrix multiplication to test the processing speed. In Numpy I was doing this:
import numpy as np
import time
a = np.random.uniform(size = (5000, 5000))
b = np.random.uniform(size = (5000, 5000))
start = time.time()
c = np.dot(a, b)
print (time.time() - start) * 1000, 'ms'
In C++ Eigen I was doing this:
#include <time.h>
#include "Eigen/Dense"
using namespace std;
using namespace Eigen;
int main() {
MatrixXf a = MatrixXf::Random(5000, 5000);
MatrixXf b = MatrixXf::Random(5000, 5000);
time_t start = clock();
MatrixXf c = a * b;
cout << (double)(clock() - start) / CLOCKS_PER_SEC * 1000 << "ms" << endl;
return 0;
}
I have done some search in the documents and on stackoverflow on the compilation optimization flags. I tried to compile the program using this command:
g++ -g test.cpp -o test -Ofast -msse2
The C++ executable compiled with -Ofast optimization flags runs about 30x or more faster than a simple no optimization compilation. It will return the result in roughly 10000ms on my 2015 macbook pro.
Meanwhile Numpy will return the result in about 1800ms.
I am expecting a boost of performance in using Eigen compared with Numpy. However, this failed my expectation.
Is there any compile flags I missed that will further boost the Eigen performance in this? Or is there any multithread switch that can be turn on to give me extra performance gain? I am just curious about this.
Thank you very much!
Edit on April 17, 2016:
After doing some search according to @ggael 's answer, I have come up with the answer to this question.
Best solution to this is compile with link to Intel MKL as backend for Eigen. for osx system the library can be found at here. With MKL installed I tried to use the Intel MKL link line advisor to enable MKL backend support for Eigen.
I compile in this manner for all MKL enablement:
g++ -DEIGEN_USE_MKL_ALL -L${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -liomp5 -lpthread -lm -ldl -m64 -I${MKLROOT}/include -I. -Ofast -DNDEBUG test.cpp -o test
If there is any environment variable error for MKLROOT just run the environment setup script provided in the MKL package which is installed default at /opt/intel/mkl/bin on my device.
With MKL as Eigen backend the matrix multiplication for two 5000x5000 operation will be finished in about 900ms on my 2.5Ghz Macbook Pro. This is much faster than Python Numpy on my device.