Eigen LSCG solver performance issue

Question

I am using Eigen's LSCG to iteratively solve an over-determined system which I have expressed using sparse matrices and I believe is too slow. By iteratively I mean something like:

    //The following is pseudo code
    main() {
    //compute A
    A=..
    //compute B
    B=..
    while(someCondition)
    {   
        x=solve(A,B)
        //based on x alter A and B  
        A=..
        B=..
    }       
}

For example when A has 36k rows and 18k cols with 145k nnz elements and B has 36k rows 3 cols and 110k nnz elements (notice that B is in fact dense) it takes my desktop 74s to execute x=solve(A,B).

Are these times normal? I guess the answer depends on the machine the code is being ran on. I have an AMD FX 6300 so I suppose the hardware is not the main problem.
If it is indeed slow what might be the problem? Does the fact that B matrix is not in reality dense slow the solving step down?

To test the time on your machine I have written some simple test code:

#include <Eigen/Sparse> //system solving and Eigen::SparseMatrix
#include <ctime> //measure time to execute
#include <unsupported/Eigen/SparseExtra> //loadMarket

using SpMatrix = Eigen::SparseMatrix<double>;
using Matrix = Eigen::MatrixXd;
int main() {

  //load A and B
  SpMatrix A, B;
  Eigen::loadMarket(A, "/AMatrixDirectory/A.mtx");
  Eigen::loadMarket(B, "/BMatrixDirectory/B.mtx");

  std::clock_t start;

  start = std::clock();

  Eigen::LeastSquaresConjugateGradient<SpMatrix> solver;
  solver.compute(A);
  Matrix x = solver.solve(B);

  std::cout << "Time: " << (std::clock() - start) / (double)(CLOCKS_PER_SEC)
            << " s" << std::endl;

  return 0;
}

The above is one iteration of the while loop in the pseudo code above. To run the above code you will need:

Eigen
C++11 (if not replace using with typedefs)
The matrices A and B which I have uploaded here

Edit

ggael proposed to use SimplicialLDLT claiming that it has a better performance compared to LSCG in my problem.

For the purpose of comparing Eigen's solvers performance to a particular problem Eigen offers the BenchmarkRoutine . Unfortunately I was not able to use it since only square matrices can be used with it.

I edited the code comparing LSCG and SimplicialLDLT (please do not get discouraged by the length of the code since it consists of 4 blocks similar to each other with only some minor changes):

#include <Eigen/Sparse> //system solving and Eigen::SparseMatrix
#include <ctime>        //measure time to execute
#include <unsupported/Eigen/SparseExtra> //loadMarket

// Use typedefs instead of using if c++11 is not supported by your compiler
using SpMatrix = Eigen::SparseMatrix<double>;
using Matrix = Eigen::MatrixXd;

int main() {
  // Use multi-threading. If you don't have OpenMP on your system
  // it will simply use 1 thread and it will not crash. So do not worry about
  // that.
  Eigen::initParallel();
  Eigen::setNbThreads(6);

  // Load system matrices
  SpMatrix A, B;
  Eigen::loadMarket(A, "/home/iason/Desktop/Thesis/build/A.mtx");
  Eigen::loadMarket(B, "/home/iason/Desktop/Thesis/build/B.mtx");

  // Initialize the clock which will measure the solvers
  std::clock_t start;

  {
    // Reset clock
    start = std::clock();
    // Solve system using LSCG
    Eigen::LeastSquaresConjugateGradient<SpMatrix> LSCG_solver;
    LSCG_solver.setTolerance(pow(10, -10));
    LSCG_solver.compute(A);
    // Use auto specifier to hold the return value of solve. Eigen::Solve object
    // is being returned
    auto LSCG_solution = LSCG_solver.solve(B);
    std::cout << "LSCG Time Using auto: "
              << (std::clock() - start) / (double)(CLOCKS_PER_SEC) << " s"
              << std::endl;
  }
  {
    // Reset clock
    start = std::clock();
    // Solve system using LSCG
    Eigen::LeastSquaresConjugateGradient<SpMatrix> LSCG_solver;
    LSCG_solver.setTolerance(pow(10, -10));
    LSCG_solver.compute(A);
    // Use Matrix specifier instead of auto. Implicit conversion taking place(?)
    Matrix LSCG_solution = LSCG_solver.solve(B);
    std::cout << "LSCG Time Using Matrix: "
              << (std::clock() - start) / (double)(CLOCKS_PER_SEC) << " s"
              << std::endl;
  }
  {
    // Reset clock
    start = std::clock();
    // Solve system using SimplicialLDLT
    Eigen::SimplicialLDLT<SpMatrix> SLDLT_solver;
    SLDLT_solver.compute(A.transpose() * A);
    auto SLDLT_solution = SLDLT_solver.solve(A.transpose() * B);

    std::cout << "SimplicialLDLT Time Using auto: "
              << (std::clock() - start) / (double)(CLOCKS_PER_SEC) << " s"
              << std::endl;
  }
  {
    // Reset clock
    start = std::clock();
    // Solve system using SimplicialLDLT
    Eigen::SimplicialLDLT<SpMatrix> SLDLT_solver;
    SLDLT_solver.compute(A.transpose() * A);
    // Use Matrix specifier instead of auto. Implicit conversion taking place(?)
    Matrix SLDLT_solution = SLDLT_solver.solve(A.transpose() * B);

    std::cout << "SimplicialLDLT Time Using Matrix: "
              << (std::clock() - start) / (double)(CLOCKS_PER_SEC) << " s"
              << std::endl;
  }
  return 0;

The above code produces the following result:

LSCG Time Using auto: 0.000415 s

LSCG Time Using Matrix: 52.7668 s

SimplicialLDLT Time Using auto: 0.216593 s

SimplicialLDLT Time Using Matrix: 0.239976 s

As the results proove when I use Eigen::MatrixXd instead of auto I get the situation ggael described in one of his comments, namely LSCG not achieving a solution without setting a higher tolerance and SimplicialLDLT being much faster.

Are the solvers actually solving the system when I use auto?
Is the Solver object being implicitly converted to a Matrix Object when I use the Matrix specifier? Since When using LSCG the only change in the first two cases is the use of auto and Matrix respectively,does this implicit conversion to Matrix take 52.7668-0.000415 seconds?
Is there a faster way to extract the solution Matrix from the Solve object?

It seems that LSCG fails to converge if the tolerance it set too low, perhaps because of the squaring. So calling solver.setTolerance(1e-14); prior to solving fixes the issue. Here the computation time reduces to 6s. — ggael
When you use auto, you get an abstract expression of the solution which has not been computed yet. See this. — ggael

ggael ggael · Accepted Answer · 2017-09-02T22:09:40

Given the sparsity pattern of your matrix A, forming the normal equation explicitly and using a direct solver such as SimplicialLDLT will be much faster. Also better use a dense type for B since it is dense anyway and that internally, all solvers will convert the sparse right-hand-side to dense columns:

Eigen::MatrixXd dB = B; // or directly fill dB
Eigen::SimplicialLDLT<SpMatrix> solver;
solver.compute(A.transpose()*A);
MatrixXd x = solver.solve(A.transpose()*dB);

This takes 0.15s, in contrast to 6s for LSCG after the setting the tolerance of LSCG to 1E-14.

Eigen LSCG solver performance issue

Edit

1 Answers