Why does MATLAB/Octave wipe the floor with C++ in Eigenvalue Problems?

Question

I'm hoping that the answer to the question in the title is that I'm doing something stupid!

Here is the problem. I want to compute all the eigenvalues and eigenvectors of a real, symmetric matrix. I have implemented code in MATLAB (actually, I run it using Octave), and C++, using the GNU Scientific Library. I am providing my full code below for both implementations.

As far as I can understand, GSL comes with its own implementation of the BLAS API, (hereafter I refer to this as GSLCBLAS) and to use this library I compile using:

g++ -O3 -lgsl -lgslcblas

GSL suggests here to use an alternative BLAS library, such as the self-optimizing ATLAS library, for improved performance. I am running Ubuntu 12.04, and have installed the ATLAS packages from the Ubuntu repository. In this case, I compile using:

g++ -O3 -lgsl -lcblas -latlas -lm

For all three cases, I have performed experiments with randomly-generated matrices of sizes 100 to 1000 in steps of 100. For each size, I perform 10 eigendecompositions with different matrices, and average the time taken. The results are these:

Plot of Results

The difference in performance is ridiculous. For a matrix of size 1000, Octave performs the decomposition in under a second; GSLCBLAS and ATLAS take around 25 seconds.

I suspect that I may be using the ATLAS library incorrectly. Any explanations are welcome; thanks in advance.

Some notes on the code:

In the C++ implementation, there is no need to make the matrix symmetric, because the function only uses the lower triangular part of it.
In Octave, the line triu(A) + triu(A, 1)' enforces the matrix to be symmetric.
If you wish to compile the C++ code your own Linux machine, you also need to add the flag -lrt, because of the clock_gettime function.
Unfortunately I don't think clock_gettime exits on other platforms. Consider changing it to gettimeofday.

Octave Code

K = 10;

fileID = fopen('octave_out.txt','w');

for N = 100:100:1000
    AverageTime = 0.0;

    for k = 1:K
        A = randn(N, N);
        A = triu(A) + triu(A, 1)';
        tic;
        eig(A);
        AverageTime = AverageTime + toc/K;
    end

    disp([num2str(N), " ", num2str(AverageTime), "\n"]);
    fprintf(fileID, '%d %f\n', N, AverageTime);
end

fclose(fileID);

C++ Code

#include <iostream>
#include <fstream>
#include <time.h>

#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_eigen.h>
#include <gsl/gsl_vector.h>
#include <gsl/gsl_matrix.h>

int main()
{
    const int K = 10;

    gsl_rng * RandomNumberGenerator = gsl_rng_alloc(gsl_rng_default);
    gsl_rng_set(RandomNumberGenerator, 0);

    std::ofstream OutputFile("atlas.txt", std::ios::trunc);

    for (int N = 100; N <= 1000; N += 100)
    {
        gsl_matrix* A = gsl_matrix_alloc(N, N);
        gsl_eigen_symmv_workspace* EigendecompositionWorkspace = gsl_eigen_symmv_alloc(N);
        gsl_vector* Eigenvalues = gsl_vector_alloc(N);
        gsl_matrix* Eigenvectors = gsl_matrix_alloc(N, N);

        double AverageTime = 0.0;
        for (int k = 0; k < K; k++)
        {   
            for (int i = 0; i < N; i++)
            {
                for (int j = 0; j < N; j++)
                {
                    gsl_matrix_set(A, i, j, gsl_ran_gaussian(RandomNumberGenerator, 1.0));
                }
            }

            timespec start, end;
            clock_gettime(CLOCK_MONOTONIC_RAW, &start);

            gsl_eigen_symmv(A, Eigenvalues, Eigenvectors, EigendecompositionWorkspace);

            clock_gettime(CLOCK_MONOTONIC_RAW, &end);
            double TimeElapsed = (double) ((1e9*end.tv_sec + end.tv_nsec) - (1e9*start.tv_sec + start.tv_nsec))/1.0e9;
            AverageTime += TimeElapsed/K;
            std::cout << "N = " << N << ", k = " << k << ", Time = " << TimeElapsed << std::endl;
        }
        OutputFile << N << " " << AverageTime << std::endl;

        gsl_matrix_free(A);
        gsl_eigen_symmv_free(EigendecompositionWorkspace);
        gsl_vector_free(Eigenvalues);
        gsl_matrix_free(Eigenvectors);
    }

    return 0;
}

What is the output of ldd $(which octave) ? Do you see references to OpenBLAS or ATLAS in it ? What CPU usage does top show for octave or the C++ version when you run them? — damienfrancois
you didn't ask octave to return the eigenvectors which makes the problem easier. — Memming

Micah Micah · Accepted Answer · 2013-12-06T22:03:07

I disagree with the previous post. This is not a threading issue, this is an algorithm issue. The reason matlab, R, and octave wipe the floor with C++ libraries is because their C++ libraries use more complex, better algorithms. If you read the octave page you can find out what they do[1]:

Eigenvalues are computed in a several step process which begins with a Hessenberg decomposition, followed by a Schur decomposition, from which the eigenvalues are apparent. The eigenvectors, when desired, are computed by further manipulations of the Schur decomposition.

Solving eigenvalue/eigenvector problems is non-trivial. In fact its one of the few things "Numerical Recipes in C" recommends you don't implement yourself. (p461). GSL is often slow, which was my initial response. ALGLIB is also slow for its standard implementation (I'm getting about 12 seconds!):

#include <iostream>
#include <iomanip>
#include <ctime>

#include <linalg.h>

using std::cout;
using std::setw;
using std::endl;

const int VERBOSE = false;

int main(int argc, char** argv)
{

    int size = 0;
    if(argc != 2) {
        cout << "Please provide a size of input" << endl;
        return -1;
    } else {
        size = atoi(argv[1]);
        cout << "Array Size: " << size << endl;
    }

    alglib::real_2d_array mat;
    alglib::hqrndstate state;
    alglib::hqrndrandomize(state);
    mat.setlength(size, size);
    for(int rr = 0 ; rr < mat.rows(); rr++) {
        for(int cc = 0 ; cc < mat.cols(); cc++) {
            mat[rr][cc] = mat[cc][rr] = alglib::hqrndnormal(state);
        }
    }

    if(VERBOSE) {
        cout << "Matrix: " << endl;
        for(int rr = 0 ; rr < mat.rows(); rr++) {
            for(int cc = 0 ; cc < mat.cols(); cc++) {
                cout << setw(10) << mat[rr][cc];
            }
            cout << endl;
        }
        cout << endl;
    }

    alglib::real_1d_array d;
    alglib::real_2d_array z;
    auto t = clock();
    alglib::smatrixevd(mat, mat.rows(), 1, 0, d, z);
    t = clock() - t;

    cout << (double)t/CLOCKS_PER_SEC << "s" << endl;

    if(VERBOSE) {
        for(int cc = 0 ; cc < mat.cols(); cc++) {
            cout << "lambda: " << d[cc] << endl;
            cout << "V: ";
            for(int rr = 0 ; rr < mat.rows(); rr++) {
                cout << setw(10) << z[rr][cc];
            }
            cout << endl;
        }
    }
}

If you really need a fast library, probably need to do some real hunting.

[1] http://www.gnu.org/software/octave/doc/interpreter/Basic-Matrix-Functions.html

Why does MATLAB/Octave wipe the floor with C++ in Eigenvalue Problems?

3 Answers