The PLS regression using sklearn gives very poor prediction results. When I get the model I can not find the way to find the "intercept". Perhaps this affects the prediction of the model? The matrix of scores and loadings are fine. The arrangement of the coefficients also. In any case, how do I get the intercept using the attributes already obtained?
This code throws the coefficients of the variables.
from pandas import DataFrame
from sklearn.cross_decomposition import PLSRegression
X = DataFrame( {
'x1': [0.0,1.0,2.0,2.0],
'x2': [0.0,0.0,2.0,5.0],
'x3': [1.0,0.0,2.0,4.0],
}, columns = ['x1', 'x2', 'x3'] )
Y = DataFrame({
'y': [ -0.2, 1.1, 5.9, 12.3 ],
}, columns = ['y'] )
def regPLS1(X,Y):
_COMPS_ = len(X.columns) # all latent variables
model = PLSRegression(_COMPS_).fit( X, Y )
return model.coef_
The result is:
regPLS1(X,Y)
>>> array([[ 0.84], [ 2.44], [-0.46]])
In addition to these coefficients, the value of the intercept is: 0.26. What am I doing wrong?
EDIT The correct predict(evaluate) response is Y_hat (exactly the same the observed Y):
Y_hat = [-0.2 1.1 5.9 12.3]
[0, 0, 0, 0]
? – MMF