I've created and tuned multiple models, but I run into issues when I try to predict them. I first run my code as followed to tune an LDA model.
library(MASS)
library(caret)
library(randomForest)
data(survey)
data<-survey
#create training and test set
split <- createDataPartition(data$W.Hnd, p=.8)[[1]]
train<-data[split,]
test<-data[-split,]
#creating training parameters
control <- trainControl(method = "cv",
number = 10,
p =.8,
savePredictions = TRUE,
classProbs = TRUE,
summaryFunction = twoClassSummary)
#fitting and tuning model
lda_tune <- train(W.Hnd ~ . ,
data=train,
method = "glm" ,
metric = "ROC",
trControl = control)
However when I run
results <- predict(rf_tune, newdata=test)
,
the output is only 32 rows, when the test set has 46 rows. This is problematic as I create a data.frame
of the test results with the predicted values from multiple models to analyze using a confusion matrix. For instance, when I run this
results<-data.frame(obs = test$W.Hnd, lda = predict(lda_tune, newdata = test))
I get the error Error in
$<-.data.frame(
tmp, "rf_results", value = c(2L, 2L, 2L, :
replacement has 32 rows, data has 46
Can someone explain to me why caret is returning 32 predicted values when there are clearly 46 values to predict or when I explicitly call the model to predict the values in the test set?
results <- predict(lda_tune, newdata=test)
? – Whitebeardtest
contains missing values.nrow(test[complete.cases(test), ])
gives me34
– Whitebeard