0
votes

Usually we split the original feature and target data (X,y) in (X_train, y_train) and (X_test, y_test).

By using the method:

mae_A = cross_val_score(clf, X_train_scaled, y_train, scoring="neg_mean_absolute_error", cv=kfold)

I get the cross validation Mean Absolute Error (MAE) for the (X_train, y_train), right?

So, how can I get the MAE (from the previous cross-validation models got by using (X_train, y_train)) for the (X_test, y_test)?

Thank you very much!

Maicon P. Lourenço

2
Usually, you don't do cross-validation for train and test separately. You do it on the whole data set.DollarAkshay
If in cv=kfold instead of kfold you use an iterable yielding (train, test) splits as arrays of indices, your model will train on train indices and produce score for test indices.Sergey Bushmanov

2 Answers

2
votes

This is the correct approach. As a rule, you should only train your model using training data. Thus the test_set should remain unseen in the cross-validation process, i.e. by the model's hyperparameters, otherwise you could be biasing the results obtained from the model by adding knowledge from the test sample.

I get the cross validation Mean Absolute Error (MAE) for the (X_train, y_train), right?

Yes, the error displayed by cross_val_score will be only from the training data. So the idea is that once you are satisfied with the results of cross_val_score, you fit the final model with the whole training set, and perform a prediction on y_test. For that you could use sklearn.metrics. For isntance, if you wanted to obtain the MAE:

from sklearn.metrics import mean_absolute_error as mae
accuracy = mae(y_test, y_pred)
0
votes

Try this:

(assuming you've got data x,y and the data is already fitted with fit(x,y)

from sklearn import linear_model
from sklearn.model_selection import cross_val_score
reg = linear_model.LinearRegression()
scoring = 'neg_mean_absolute_error'
mae = cross_val_score(reg, x, y, cv=5,scoring=scoring)
mae