Why do different methods for solving Xc=y in python give different solution when they should not?

Question

I was trying to solve a linear system Xc=y that was square. The methods I know to solve this are:

using inverse c=<X^-1,y>
using Gaussian elimination
using the pseudo-inverse

It seems as far as I can tell that these don't match what I thought would be the ground truth.

First generate the truth parameters by fitting a polynomial of degree 30 to a cosine with frequency 5. So I have y_truth = X*c_truth.
Then I check if the above three methods match the truth

I tried it but the methods don't seem to match and I don't see why that should be the case.

I produced fully runnable reproducible code:

import numpy as np
from sklearn.preprocessing import PolynomialFeatures

## some parameters
degree_target = 25
N_train = degree_target+1
lb,ub = -2000,2000
x = np.linspace(lb,ub,N_train)
## generate target polynomial model
freq_cos = 5
y_cos = np.cos(2*np.pi*freq_cos*x)
c_polyfit = np.polyfit(x,y_cos,degree_target)[::-1] ## needs to me reverse to get highest power last
## generate kernel matrix
poly_feat = PolynomialFeatures(degree=degree_target)
K = poly_feat.fit_transform(x.reshape(N_train,1)) # generates degree 0 first
## get target samples of the function
y = np.dot(K,c_polyfit)
## get pinv approximation of c_polyfit
c_pinv = np.dot( np.linalg.pinv(K), y)
## get Gaussian-Elminiation approximation of c_polyfit
c_GE = np.linalg.solve(K,y)
## get inverse matrix approximation of c_polyfit
i = np.linalg.inv(K)
c_mdl_i = np.dot(i,y)
## check rank to see if its truly invertible
print('rank(K) = {}'.format( np.linalg.matrix_rank(K) ))
## comapre parameters
print('--c_polyfit')
print('||c_polyfit-c_GE||^2 = {}'.format( np.linalg.norm(c_polyfit-c_GE) ))
print('||c_polyfit-c_pinv||^2 = {}'.format( np.linalg.norm(c_polyfit-c_pinv) ))
print('||c_polyfit-c_mdl_i||^2 = {}'.format( np.linalg.norm(c_polyfit-c_mdl_i) ))
print('||c_polyfit-c_polyfit||^2 = {}'.format( np.linalg.norm(c_polyfit-c_polyfit) ))
##
print('--c_GE')
print('||c_GE-c_GE||^2 = {}'.format( np.linalg.norm(c_GE-c_GE) ))
print('||c_GE-c_pinv||^2 = {}'.format( np.linalg.norm(c_GE-c_pinv) ))
print('||c_GE-c_mdl_i||^2 = {}'.format( np.linalg.norm(c_GE-c_mdl_i) ))
print('||c_GE-c_polyfit||^2 = {}'.format( np.linalg.norm(c_GE-c_polyfit) ))
##
print('--c_pinv')
print('||c_pinv-c_GE||^2 = {}'.format( np.linalg.norm(c_pinv-c_GE) ))
print('||c_pinv-c_pinv||^2 = {}'.format( np.linalg.norm(c_pinv-c_pinv) ))
print('||c_pinv-c_mdl_i||^2 = {}'.format( np.linalg.norm(c_pinv-c_mdl_i) ))
print('||c_pinv-c_polyfit||^2 = {}'.format( np.linalg.norm(c_pinv-c_polyfit) ))
##
print('--c_mdl_i')
print('||c_mdl_i-c_GE||^2 = {}'.format( np.linalg.norm(c_mdl_i-c_GE) ))
print('||c_mdl_i-c_pinv||^2 = {}'.format( np.linalg.norm(c_mdl_i-c_pinv) ))
print('||c_mdl_i-c_mdl_i||^2 = {}'.format( np.linalg.norm(c_mdl_i-c_mdl_i) ))
print('||c_mdl_i-c_polyfit||^2 = {}'.format( np.linalg.norm(c_mdl_i-c_polyfit) ))

and I get the result:

rank(K) = 4
--c_polyfit
||c_polyfit-c_GE||^2 = 4.44089220304006e-16
||c_polyfit-c_pinv||^2 = 1.000000000000001
||c_polyfit-c_mdl_i||^2 = 1.1316233165135605e-06
||c_polyfit-c_polyfit||^2 = 0.0
--c_GE
||c_GE-c_GE||^2 = 0.0
||c_GE-c_pinv||^2 = 1.0000000000000007
||c_GE-c_mdl_i||^2 = 1.1316233160694804e-06
||c_GE-c_polyfit||^2 = 4.44089220304006e-16
--c_pinv
||c_pinv-c_GE||^2 = 1.0000000000000007
||c_pinv-c_pinv||^2 = 0.0
||c_pinv-c_mdl_i||^2 = 0.9999988683985006
||c_pinv-c_polyfit||^2 = 1.000000000000001
--c_mdl_i
||c_mdl_i-c_GE||^2 = 1.1316233160694804e-06
||c_mdl_i-c_pinv||^2 = 0.9999988683985006
||c_mdl_i-c_mdl_i||^2 = 0.0
||c_mdl_i-c_polyfit||^2 = 1.1316233165135605e-06

Why is it? Is it a machine precision thing? Or is it because the error accumulates (a lot) when degree is large (greater than 1)? Honestly I don't know but all those hypothesis seem silly to me. If anyone can spot my mistake, feel free to point it out. Otherwise, I might not understand linear algebra or something...which is a lot more worrying.

Also if I can get suggestions for this to work, it would be highly appreciated. Do I:

increase the size of the interval to no be less than 1 (in magnitude)?
what is the largest polynomial size degree I may use?
different language...? Or increase the precision?

any suggestions are appreciated!

as a fun side question, if I were to train my polynomial model with GD or SGD, would I have the same numerical errors if I restrict my model to be in [-1,1] or is this just special to (pseudo) inverting a matrix? — Charlie Parker
Another follow up question. I've been suggested multiple times to just change the interval to say, [-5,5]. However, now my question is, why doesn't the opposite issue arise and then instead of 1.0+eps = 1.0 that we get 1.0+big_number = NaN? if big_number = number^30? Like why do numerical issue seem to be more sensitive when raising to the power of 30 a number with magnitude less than 1? — Charlie Parker

jakevdp jakevdp · Accepted Answer · 2017-10-21T03:53:14

The issue is floating point accuracy. You're raising numbers between zero and one to the 30th power, then adding them together. If you were doing this with infinite precision arithmetic, the methods would recover the inputs. With floating point arithmetic, precision loss means you can't.

Why do different methods for solving Xc=y in python give different solution when they should not?

1 Answers