I have a training data set, let's call it: "training_data
", which consists of 19 variables (features) and 1 label, total of 20 variables (columns). This data set only contains the best predictors, meaning that low variance columns or bad predictors have been removed, I mean, this is the resulting data frame from feature selection. Let's call the label in this data set: "final_score
"
Also, I have a test data set, lest's call it: "predictions_data
", that has the same 19 variables (features) but has no label variable, so in total, this set is 19 variables (columns).
I'm doing a very simple regression model, using a "lasso regression" from Caret's "train
" method, to train the model and further predict labels ("final_score
") in the "predictions_data
".
My code goes as follows:
# Import training data as a data frame:
training_data <- data.frame(training_data)
# Set cross validation folds and times:
fitControl <- trainControl(method = "repeatedcv",
number = 3, # number of folds
repeats = 3) # repeated three times
# Train the model using "lasso" regression from train method. I've called the model as "model.cv":
model.cv <- train(final_score ~ .,
data = training_data,
method = "lasso",
trControl = fitControl,
preProcess = c('scale', 'center'))
So far, everything goes nice, the model shows the best results from cross validation and the metrics (RMSE, MAE, etc.) obtained.
So now, I want to apply the model to the "predictions_data
", so the model can "predict" the final_score
.
My code for trying to do this, is:
# Import test data set to a data frame (with no label column):
predictions_data <- data.frame(predictions_data)
# Apply the model using predict function from Caret, and save them in an object called: "predictions":
predictions <- predict(model.cv, newdata = predictions_data)
And here comes the problem. Even I stated that newdata = predictions_data
, the predictions object returns the predicted labels for the training data set and not the test data set... What am I doing wrong? (well obviously this is a very basic model, but event though it should work with predictions, right?)
Thanks in advance!