2
votes

I have a sparse linear system Ax = b. In my application, A is a symmetric sparse matrix with typical size around 2,500,000 x 2,500,000, with non-zeros on main diagonal and on another diagonal (plus the symmetric to this one). This makes it 2-3 non-zeros per row/col. 

To test my code, I am comparing MATLAB and Eigen. I created a 1,000,000 x 1,000,000 sparse matrix A. In MATLAB, I simply use x = A\b and it takes about 8 seconds. In Eigen, I have tried several solvers. SuperLU takes about 150 s. SimplicialCholesky takes about 300 seconds. UmfPackLU takes about 490 s. These times are too long for me; on real data, it just takes too long to be useful. Other solvers give completely different results compared to MATLAB, iterative solvers took too long. SimplicialCholesky, SuperLU and UmfPackLU give similar (they differ at decimal places), so I hope this counts as same.   Eigen code: 

// prepare sparse matrix A
    std::vector<T> tripletList; // I am leaving filling the triplet list out
    Eigen::SparseMatrix<float> A(k, k); // k is usually around 2500000, in the test case I described here it is 1000000
    A.setFromTriplets(tripletList.begin(), tripletList.end());
    A.makeCompressed();

// prepare vector b
    Eigen::Map<Eigen::VectorXf> b; // vector b is filled with values

// calculate A x = b and measure time - for SimplicialCholesky
    t1 = std::chrono::steady_clock::now();
    Eigen::SimplicialCholesky<Eigen::SparseMatrix<float>> solver_chol(A);
    x = solver_chol.solve(b);
    t2 = std::chrono::steady_clock::now();
    log_file << "SimlicialCholeskytime: t2 - t1 = " << std::chrono::duration_cast<std::chrono::seconds>(t2 - t1).count() << " s \n";

// calculate A x = b and measure time - for SparseLU
    t1 = std::chrono::steady_clock::now();
    Eigen::SparseLU<Eigen::SparseMatrix<float>> solver_slu(A);
    x = solver_slu.solve(b);
    t2 = std::chrono::steady_clock::now();
    log_file << "SparseLU time: t2 - t1 = " << std::chrono::duration_cast<std::chrono::seconds>(t2 - t1).count() << " s \n";

// calculate A x = b and measure time - for UmfPackLU - here I had to convert to double.
    Eigen::SparseMatrix<double> Ad = A.cast <double>();
    Ad.makeCompressed();
    Eigen::VectorXd bd = b.cast <double>();
    t1 = std::chrono::steady_clock::now();
    Eigen::UmfPackLU<Eigen::SparseMatrix<double>> solver(Ad);
    Eigen::VectorXd xd = solver.solve(bd);
    t2 = std::chrono::steady_clock::now();
    log_file << "UmfPackLU time: t2 - t1 = " << std::chrono::duration_cast<std::chrono::seconds>(t2 - t1).count() << " s \n";

Perhaps I should mention that calculation runs on all 8 cores, so when I watch the time, I get 8 times, which I sum up. Also, the calculation is (so far) wrapped in .dll library .cu, it will be parallelized via CUDA in the next step. I measured times for all methods separately in order to avoid some counting overlapping.

I found the following possible solutions to speed up the calculation:

Is there anything I can do to speed up calculations using Eigen, so it takes a similar time as MATLAB? Am I using the correct solver, regarding the size and sparsity of matrix? Am I using current solvers correctly? Do I have to do some additional setup, include some other libraries? If it is not possible, are there some other libraries I could use? 

I am working on Windows 10, 64bit machine. I have Visual Studio 2019. 

1
This Stack Overflow question is probably relevant. Long story short: MATLAB is rather expensive, because it pays a lot of engineers to optimise their library beyond what any other programming language can offer. I doubt you can beat MATLAB's speed on your own, using open-source libraries. I'd say your best bet is to somehow use a mathematical/computational trick which exploits the specific structure/pattern of your sparse matrix, such as symmetry, positive-definiteness etc. - Adriaan
“compiler - maximum optimizations, favor speed“ — Try different optimization levels, sometimes -O2 is faster than -O3. Also turn on architecture-specific options, in GCC this is -march, and can make a big difference. - Cris Luengo
“when I watch the time, I get 8 times, which I sum up” — This makes no sense. You’re basically multiplying your timing by 8? Do you do this for MATLAB too? Your code shows you use steady_clock, which measures wall time. This is the fair value to compare. - Cris Luengo
Thank you. My aim is not really to beat Matlab, I am more looking for a way to improve my solution in C++/CUDA. The times are way too slow and from what I have read, they could be faster than I currently have. I am a beginner and I feel I might miss something because of lack of experience. - Dalecanka
Im sure CUDA is a good tool for this. But providing a full solution of a sparse solver is too much for here! Check cuSparse and its friends. - Ander Biguri

1 Answers

0
votes

I have tried many linear solvers recently for my spectral collocation solver, and I found that the "armadillo" is the fast one that solves dense Ax=b based on openblas library. Eigen3.3 is very slow even with "setNumbthreads", I still can't find the reason. If you want to solve it with Cuda or OpenMP. I strongly suggest you to use paralution library. it works fine for my problem. Regards

http://www.paralution.com/