1
votes

I have been using the following functions to invert matrices in lapack:

DGETRF()
DGETRI()

However, the subroutine DGETRF() considers a partial pivoting.

Question: is there any other subroutine in lapack/blas that could replace the DGETRF() without any pivoting. Also, I would like to know if there is any subroutine that uses gaussian elimination for the matrix inversion instead of LU decomposition.

2
Why does the algorithm used by a canned routine make a difference? - Bob Jarvis - Reinstate Monica
It makes difference because of performance. Pivoting algorithms are expensive. By writing a Gaussian elimination algorithm for matrix inversion I have been able to obtain almost 50% of the computational time of the subroutines mentioned. As BLAS and LAPACK were developed by experts in this subject, we believe that the package may have faster algorithms that do not perform any pivoting. - Bruno F

2 Answers

1
votes

The list of computational routines available in LAPACK can be found here - scroll to the bottom of the page, Table 2.8. As far as I can tell, all methods for general matrices use the factorization to perform inversion. So then there is no subroutine which uses GE to compute inverse.

Related to pivoting, I suspect that unless you really know what you are doing, you should probably not use a method without partial pivoting. Partial pivoting is not terribly expensive in terms of performance, but does provide improved numerical stability. As far as I am aware, Gaussian Elimination is also generally used with partial pivoting. In particular the algorithm breaks if any element on the main diagonal is zero (for the obvious reason). Furthermore if the value of the pivot is maximised then the numerical accuracy of the solution is generally improved.

If you are getting improved performance for your GE version without pivoting it's simply because you are trading off accuracy / generality of the approach for performance (which IMHO is fair game when you really know what you are doing and have a specific use case in mind, but I can see why library authors prefer to have the more generic implementation).

As an aside if you are in the performance comparison game I would, for good measure compare against the functions for matrix inversion from Intel's MKL and ensure all compiler optimisation flags are correctly specified (-O3 -march=native at least) before drawing any conclusions.

1
votes

Is it the case that your code operates C matrix storage (row-wise)? LAPACK and BLAS use Fortran storage (column-wise). It could make a big difference in terms of cache misses.

Are your matrices large (fit in cache)? If so then you need to turn very aggressive optimization at the compiler level. Especially when it comes to relaxed floating-point arithmetic and SIMD vectorization. If your matrices are large then as suggested by paul-g use faster BLAS at the very least and vendor LAPACK if available. Reference BLAS is not CPU-optimized, it's just stock Fortran code.