7
votes

I'm studying simple multiplication of two big matrices using the Eigen library. This multiplication appears to be noticeably slower than both Matlab and Python for the same size matrices.

Is there anything to be done to make the Eigen operation faster?

Problem Details

X : random 1000 x 50000 matrix

Y : random 50000 x 300 matrix

Timing experiments (on my late 2011 Macbook Pro)

Using Matlab: X*Y takes ~1.3 sec

Using Enthought Python: numpy.dot( X, Y) takes ~ 2.2 sec

Using Eigen: X*Y takes ~2.7 sec

Eigen Details

You can get my Eigen code (as a MEX function): https://gist.github.com/michaelchughes/4742878

This MEX function reads in two matrices from Matlab, and returns their product.

Running this MEX function without the matrix product operation (ie just doing the IO) produces negligible overhead, so the IO between the function and Matlab doesn't explain the big difference in performance. It's clearly the actual matrix product operation.

I'm compiling with g++, with these optimization flags: "-O3 -DNDEBUG"

I'm using the latest stable Eigen header files (3.1.2).

Any suggestions on how to improve Eigen's performance? Can anybody replicate the gap I'm seeing?

UPDATE The compiler really seems to matter. The original Eigen timing was done using Apple XCode's version of g++: llvm-g++-4.2.

When I use g++-4.7 downloaded via MacPorts (same CXXOPTIMFLAGS), I get 2.4 sec instead of 2.7.

Any other suggestions of how to compile better would be much appreciated.

You can also get raw C++ code for this experiment: https://gist.github.com/michaelchughes/4747789

./MatProdEigen 1000 50000 300

reports 2.4 seconds under g++-4.7

3
do you know what algorithm it implements? looks like it may just be using a crappy matrix multiplication algorithm. one other thing to try is to enable auto vectorization : gcc.gnu.org/projects/tree-ssa/vectorization.html (not on by default, I don't think... well, maybe. not sure). if you're on an intel machine, try using intel compiler... i've noticed that it always outperforms everyone else in optimization. also see here eigen.tuxfamily.org/index.php?title=FAQ#Vectorizationthang
@thang: Eigen was designed for linear algebra, so I'd be surprised if the algorithm used is that bad. tree vectorization is enabled by default with the "-O3" optimization flag I'm using according to your link, so that's not the issue AFAIK. I might try Intel compiler if no other suggestions crop up.Mike Hughes
@MikeHuges, you could also try plotting the growth rate as the size of the matrix increases and may give some hints as to what's going on. that should give an indication of which algorithm it uses. or well, dig into their source or documentation.thang
Hi, it took me about 260 secs to run the testing C++ code in my machine, I use VS2012 and windows, my processor is core i5-4570. While testing the matrix multiply, it also took me about 1.3 secs. That quite wired.user978112

3 Answers

12
votes

First of all, when doing performance comparison, makes sure you disabled turbo-boost (TB). On my system, using gcc 4.5 from macport and without turbo-boost, I get 3.5s, that corresponds to 8.4 GFLOPS while the theoretical peak of my 2.3 core i7 is 9.2GFLOPS, so not too bad.

MatLab is based on Intel MKL, and seeing the reported performance, it clearly uses a multithreaded version. It is unlikely that an small library as Eigen can beat Intel on its own CPU!

Numpy can uses any BLAS library, Atlas, MKL, OpenBLAS, eigen-blas, etc. I guess that in your case it was using Atlas which is fast too.

Finally, here is how you can get better performance: enable multi-threading in Eigen by compiling with -fopenmp. By default Eigen uses for the number of the thread the default number of thread defined by OpenMP. Unfortunately this number corresponds to the number of logic cores, and not physical cores, so make sure hyper-threading is disabled or define the OMP_NUM_THREADS environment variable to the physical number of cores. Here I get 1.25s (without TB), and 0.95s with TB.

2
votes

The reason Matlab is faster is because it uses the Intel MKL. Eigen can use it too (see here), but you of course need to buy it.

That being said, there are a number of reasons Eigen can be slower. To compare python vs matlab vs Eigen, you'd really need to code three equivalent versions of an operations in the respective languages. Also note that Matlab caches results, so you'd really need to start from a fresh Matlab session to be sure its magic isn't fooling you.

Also, Matlab's Mex overhead is not nonexistent. The OP there reports newer versions "fix" the problem, but I'd be surprised if all overhead has been cleared completely.

2
votes

Eigen doesn't take advantage of the AVX instructions that were introduced by Intel with the Sandy Bridge architecture. This probably explains most of the performance difference between Eigen and MATLAB. I found a branch that adds support for AVX at https://bitbucket.org/benoitsteiner/eigen but as far as I can tell it not been merged in the Eigen trunk yet.