In https://math.stackexchange.com/a/2233298/340174 it is mentioned that solving a linear equation "M·x = b" (matrix M is square) is slow if done via LU decomposition (but even slower using QR decomposition). Now I noticed that numpy.linalg.solve
is in fact using LU decomposition. In truth, I want to solve "V·x = b" for a non-squared Vandermonde design matrix V for the least squares. I don't want regularization. I see multiple approaches:
- Solve "V·x = b" with
numpy.linalg.lstsq
, which uses Fortran "xGELSD" based on SVD. The SVD should be even slower than LU decomposition, but I don't need to calculate "(V^T·V)". - Solve "(V^T·V)·x = (V^T·b)" with
numpy.linalg.solve
, which uses LU decomposition. - Solve "A·x = b" with
numpy.linalg.solve
, which uses LU decomposition, but calculating "A=xV^T·V" directly according to https://math.stackexchange.com/a/3155891/340174
Alaternatively I could use the newest solve
from scipy (https://docs.scipy.org/doc/scipy-1.2.1/reference/generated/scipy.linalg.solve.html) which can use diagonal pivoting for the symmetric matrix "A" (which is faster than using LU decomposition, I guess), but my scipy is stuck on 1.1.0, so I don't have access to that.
From https://stackoverflow.com/a/45535523/4533188 it seems that solve
is faster than lstsq
, including calculating "V^T·V", but when I tried it, lstsq
was faster. Maybe I am doing something wrong?
What is the fastest way of solving my linear problem?
No real options
statsmodels.regression.linear_model.OLS.fit
is uses either Moore-Penrose pseudoinverse or QR-factorization +np.linalg.inv
+np.linalg.svd
+numpy.linalg.solve
, which does not seem too efficient to me.sklearn.linear_model.LinearRegression
uses scipy.linalg.lstsq.scipy.linalg.lstsq
uses also xGELSD.- I expect calculating the inverse of "(V^T·V)" to be pretty expensive, so I discarded the direct computation of "x = (V^T·V)^-1·(V^T·b)"
np.linalg.pinv
this is almost "(V^T·V)^-1·V^T" – bubble