4
votes

In https://math.stackexchange.com/a/2233298/340174 it is mentioned that solving a linear equation "M·x = b" (matrix M is square) is slow if done via LU decomposition (but even slower using QR decomposition). Now I noticed that numpy.linalg.solve is in fact using LU decomposition. In truth, I want to solve "V·x = b" for a non-squared Vandermonde design matrix V for the least squares. I don't want regularization. I see multiple approaches:

  1. Solve "V·x = b" with numpy.linalg.lstsq, which uses Fortran "xGELSD" based on SVD. The SVD should be even slower than LU decomposition, but I don't need to calculate "(V^T·V)".
  2. Solve "(V^T·V)·x = (V^T·b)" with numpy.linalg.solve, which uses LU decomposition.
  3. Solve "A·x = b" with numpy.linalg.solve, which uses LU decomposition, but calculating "A=xV^T·V" directly according to https://math.stackexchange.com/a/3155891/340174

Alaternatively I could use the newest solve from scipy (https://docs.scipy.org/doc/scipy-1.2.1/reference/generated/scipy.linalg.solve.html) which can use diagonal pivoting for the symmetric matrix "A" (which is faster than using LU decomposition, I guess), but my scipy is stuck on 1.1.0, so I don't have access to that.

From https://stackoverflow.com/a/45535523/4533188 it seems that solve is faster than lstsq, including calculating "V^T·V", but when I tried it, lstsq was faster. Maybe I am doing something wrong?

What is the fastest way of solving my linear problem?


No real options

  • statsmodels.regression.linear_model.OLS.fit is uses either Moore-Penrose pseudoinverse or QR-factorization + np.linalg.inv + np.linalg.svd + numpy.linalg.solve, which does not seem too efficient to me.
  • sklearn.linear_model.LinearRegression uses scipy.linalg.lstsq.
  • scipy.linalg.lstsq uses also xGELSD.
  • I expect calculating the inverse of "(V^T·V)" to be pretty expensive, so I discarded the direct computation of "x = (V^T·V)^-1·(V^T·b)"
3
Vandermonde matrix has analytically expressed inverse matrix. I don't dive in deep, and don't know about floating point errors in this case, but this approach should be the fastest. Also, you can look at np.linalg.pinv this is almost "(V^T·V)^-1·V^T"bubble
@bubble That's a nice resource, but isn't that proof for square matrices only? The OP mentions a non-squared Vandermonde design matrixBrenlla

3 Answers

2
votes

I'm going to ignore the Vandermonde part of the question (the comment by bubble points out that it has an analytic inverse) and answer the more general question about other matrices instead.

I think a few things may be getting conflated here, so I'll distinguish the following:

  1. Exact solution of V x = b using LU
  2. Exact solution of V x = b using QR
  3. Least-square solution of V x = b using QR
  4. Least-square solution of V x = b using SVD
  5. Exact solution of V^T V x = V^T b using LU
  6. Exact solution of V^T V x = V^T b using Cholesky

The first maths.stackexchange answer you linked to is about cases 1 and 2. When it says LU is slow, it means relative to methods for specific types of matrix, e.g. positive-definite, triangular, banded, ...

But I think you're actually asking about 3-6. The last stackoverflow link states that 6 is faster than 4. As you said, 4 should be slower than 3, but 4 is the only one that works for rank-deficient V. 6 should be faster than 5 in general.

We should make sure that you did 6 rather than 5. To use 6, you'd need to use scipy.linalg.solve with assume_a="pos". Otherwise, you would wind up doing 5.

I haven't found a single function that does 3 in numpy or scipy. The Lapack routine is xGELS, which doesn't seem to be exposed in scipy. You should be able to do it with scupy.linalg.qr_multiply followed by scipy.linalg.solve_triangular.

2
votes

Give a try to scipy.linalg.lstsq() using lapack_driver='gelsy'!

Let's review the different routines for solving linear least square and the approches:

While very tempting, computing and using V^T·V to solve the normal equation is likely not the way to go. Indeed, the precision is endangered by the condition number of that matrix, about the square of the condition number of the matrix V. Since Vandermonde matrices tend to be badly ill-conditioned, except for the matrices of the discrete Fourier transform, it can become hazardous... Finally, you may even keep using xGELSD() to avoid problems related to conditionning. If you switch to xGELSY(), consider estimating the error.

0
votes

Some benchmark according to @francis 's answer. I turned off check_finite to get the best performance.

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from numpy.linalg import lstsq as np_lstsq
from scipy.linalg import lstsq as sp_lstsq
from scipy.linalg import solve

#custom OLS by LU factorization
def ord_lu(X, y):
    A = X.T @ X
    b = X.T @ y
    beta = solve(A, b, overwrite_a=True, overwrite_b=True,
                 check_finite=False)
    return beta

#load iris dataset for testing
data = load_iris()
iris_df = pd.DataFrame(data=data.data, columns= data.feature_names)
species = pd.get_dummies(data.target)
y = iris_df.pop('sepal length (cm)').to_numpy()
X = np.hstack([iris_df.to_numpy(), species.to_numpy()])

%timeit -n10000 -r10 result = np_lstsq(X, y)
#33.7 µs ± 1.19 µs per loop (mean ± std. dev. of 10 runs, 10000 loops each)

%timeit -n10000 -r10 result = sp_lstsq(X, y, lapack_driver='gelsd', check_finite=False)
#44.9 µs ± 1.05 µs per loop (mean ± std. dev. of 10 runs, 10000 loops each)

%timeit -n10000 -r10 result = sp_lstsq(X, y, lapack_driver='gelsy', check_finite=False)
#13.5 µs ± 661 ns per loop (mean ± std. dev. of 10 runs, 10000 loops each)

%timeit -n10000 -r10 result = sp_lstsq(X, y, lapack_driver='gelss', check_finite=False)
#43.5 µs ± 2.11 µs per loop (mean ± std. dev. of 10 runs, 10000 loops each)

%timeit -n10000 -r10 result = ord_lu(X, y)
#18 µs ± 360 ns per loop (mean ± std. dev. of 10 runs, 10000 loops each)

Based on the result, gelsy is the fastest least-squares algorithm. For unknown reason, in SciPy gelsd is even slower than gelss, which shouldn't be. But NumPy's lstsq (also using gelsd) behaves normal and is significantly faster than SciPy's gelss. The custom function using LU factorization is quite fast. But as @francis said, it's not safe.