Fastest way of solving linear least squares

Question

In https://math.stackexchange.com/a/2233298/340174 it is mentioned that solving a linear equation "M·x = b" (matrix M is square) is slow if done via LU decomposition (but even slower using QR decomposition). Now I noticed that numpy.linalg.solve is in fact using LU decomposition. In truth, I want to solve "V·x = b" for a non-squared Vandermonde design matrix V for the least squares. I don't want regularization. I see multiple approaches:

Solve "V·x = b" with numpy.linalg.lstsq, which uses Fortran "xGELSD" based on SVD. The SVD should be even slower than LU decomposition, but I don't need to calculate "(V^T·V)".
Solve "(V^T·V)·x = (V^T·b)" with numpy.linalg.solve, which uses LU decomposition.
Solve "A·x = b" with numpy.linalg.solve, which uses LU decomposition, but calculating "A=xV^T·V" directly according to https://math.stackexchange.com/a/3155891/340174

Alaternatively I could use the newest solve from scipy (https://docs.scipy.org/doc/scipy-1.2.1/reference/generated/scipy.linalg.solve.html) which can use diagonal pivoting for the symmetric matrix "A" (which is faster than using LU decomposition, I guess), but my scipy is stuck on 1.1.0, so I don't have access to that.

From https://stackoverflow.com/a/45535523/4533188 it seems that solve is faster than lstsq, including calculating "V^T·V", but when I tried it, lstsq was faster. Maybe I am doing something wrong?

What is the fastest way of solving my linear problem?

No real options

statsmodels.regression.linear_model.OLS.fit is uses either Moore-Penrose pseudoinverse or QR-factorization + np.linalg.inv + np.linalg.svd + numpy.linalg.solve, which does not seem too efficient to me.
sklearn.linear_model.LinearRegression uses scipy.linalg.lstsq.
scipy.linalg.lstsq uses also xGELSD.
I expect calculating the inverse of "(V^T·V)" to be pretty expensive, so I discarded the direct computation of "x = (V^T·V)^-1·(V^T·b)"

Vandermonde matrix has analytically expressed inverse matrix. I don't dive in deep, and don't know about floating point errors in this case, but this approach should be the fastest. Also, you can look at np.linalg.pinv this is almost "(V^T·V)^-1·V^T" — bubble
@bubble That's a nice resource, but isn't that proof for square matrices only? The OP mentions a non-squared Vandermonde design matrix — Brenlla

Subhaneil Lahiri Subhaneil Lahiri · Accepted Answer · 2019-03-27T01:59:23

I'm going to ignore the Vandermonde part of the question (the comment by bubble points out that it has an analytic inverse) and answer the more general question about other matrices instead.

I think a few things may be getting conflated here, so I'll distinguish the following:

Exact solution of V x = b using LU
Exact solution of V x = b using QR
Least-square solution of V x = b using QR
Least-square solution of V x = b using SVD
Exact solution of V^T V x = V^T b using LU
Exact solution of V^T V x = V^T b using Cholesky

The first maths.stackexchange answer you linked to is about cases 1 and 2. When it says LU is slow, it means relative to methods for specific types of matrix, e.g. positive-definite, triangular, banded, ...

But I think you're actually asking about 3-6. The last stackoverflow link states that 6 is faster than 4. As you said, 4 should be slower than 3, but 4 is the only one that works for rank-deficient V. 6 should be faster than 5 in general.

We should make sure that you did 6 rather than 5. To use 6, you'd need to use scipy.linalg.solve with assume_a="pos". Otherwise, you would wind up doing 5.

I haven't found a single function that does 3 in numpy or scipy. The Lapack routine is xGELS, which doesn't seem to be exposed in scipy. You should be able to do it with scupy.linalg.qr_multiply followed by scipy.linalg.solve_triangular.

Fastest way of solving linear least squares

No real options

3 Answers