0
votes

The PLS regression using sklearn gives very poor prediction results. When I get the model I can not find the way to find the "intercept". Perhaps this affects the prediction of the model? The matrix of scores and loadings are fine. The arrangement of the coefficients also. In any case, how do I get the intercept using the attributes already obtained?

This code throws the coefficients of the variables.

from pandas import DataFrame
from sklearn.cross_decomposition import PLSRegression

X = DataFrame( {
        'x1': [0.0,1.0,2.0,2.0],
        'x2': [0.0,0.0,2.0,5.0],
        'x3': [1.0,0.0,2.0,4.0],
    }, columns = ['x1', 'x2', 'x3'] )
Y = DataFrame({
        'y': [ -0.2, 1.1, 5.9, 12.3 ],
    }, columns = ['y'] )

def regPLS1(X,Y):
    _COMPS_ = len(X.columns) # all latent variables
    model = PLSRegression(_COMPS_).fit( X, Y )
    return model.coef_

The result is:

regPLS1(X,Y)
>>> array([[ 0.84], [ 2.44], [-0.46]])

In addition to these coefficients, the value of the intercept is: 0.26. What am I doing wrong?

EDIT The correct predict(evaluate) response is Y_hat (exactly the same the observed Y):

Y_hat = [-0.2  1.1  5.9 12.3]
2
What about predicting [0, 0, 0, 0] ?MMF
I have edited my answer. The predicted value (using 3 VL) is exactly equal to the observed value.JonAnthrax
Using model.predict(X) obtain:JonAnthrax
You have to predict your data. You don't in the snippet you show usMMF
Using model.predict(X) obtain: array([[ 2.07322661], [ 3.21992642], [ 5.62383293], [ 8.18301403]])JonAnthrax

2 Answers

2
votes

To calculate the intercept use the following:

plsModel = PLSRegression(_COMPS_).fit( X, Y )

y_intercept = plsModel.y_mean_ - numpy.dot(plsModel.x_mean_ , plsModel.coef_)

I got the formula directly from the R "pls" package:

 BInt[1,,i] <- object$Ymeans - object$Xmeans %*% B[,,i]

I tested the results and calculated the same intercepts in R 'pls' and scikit-learn.

1
votes

Based of my reading of the implementation of _PLS the formula is Y = XB + Err where model.coef_ is the estimate of B. If you look at the predict method it looks like it uses the fitted parameter y_mean_ as the Err so I believe that's what you want. Use model.y_mean_ instead of model.coef_. Hope this helps!