1
votes

I've created and tuned multiple models, but I run into issues when I try to predict them. I first run my code as followed to tune an LDA model.

library(MASS)
library(caret)
library(randomForest)
data(survey)
data<-survey

#create training and test set
split <- createDataPartition(data$W.Hnd, p=.8)[[1]]
train<-data[split,]
test<-data[-split,]


#creating training parameters
control <- trainControl(method = "cv",
                        number = 10, 
                        p =.8, 
                        savePredictions = TRUE, 
                        classProbs = TRUE, 
                        summaryFunction = twoClassSummary)

#fitting and tuning model
lda_tune <- train(W.Hnd ~ . , 
            data=train, 
            method = "glm" ,
            metric = "ROC",
            trControl = control)

However when I run results <- predict(rf_tune, newdata=test),

the output is only 32 rows, when the test set has 46 rows. This is problematic as I create a data.frame of the test results with the predicted values from multiple models to analyze using a confusion matrix. For instance, when I run this

results<-data.frame(obs = test$W.Hnd, lda = predict(lda_tune, newdata = test))

I get the error Error in$<-.data.frame(tmp, "rf_results", value = c(2L, 2L, 2L, : replacement has 32 rows, data has 46

Can someone explain to me why caret is returning 32 predicted values when there are clearly 46 values to predict or when I explicitly call the model to predict the values in the test set?

1
Do you mean results <- predict(lda_tune, newdata=test)?Whitebeard
I think your problem is that test contains missing values. nrow(test[complete.cases(test), ]) gives me 34Whitebeard

1 Answers

2
votes

Running your code resulted in errors on my side. The twoClasSummary returns an error. But ignoring that, you are first talking about lda_tune and later about rf_tune.

Accounting for these issues, the problem lies with missing values in your test set. If you check nrow(test[complete.cases(test), ]) you will see that it returns 33 cases. Which is exactly what the predict returns.

I added the code below for refence. Including rf_tune and lda_tune + their results.

library(MASS)
library(caret)
library(randomForest)
data(survey)
data<-survey

#create training and test set
split <- createDataPartition(data$W.Hnd, p=.8)[[1]]
train<-data[split,]
test<-data[-split,]


#creating training parameters
control <- trainControl(method = "cv",
                        number = 10, 
                        p =.8, 
                        savePredictions = TRUE, 
                        classProbs = TRUE)

#fitting and tuning model
lda_tune <- train(W.Hnd ~ . , 
                  data=train, 
                  method = "glm" ,
                  metric = "ROC",
                  trControl = control)

rf_tune <- train(W.Hnd ~ . , 
                  data=train, 
                  method = "rf" ,
                  metric = "ROC",
                  trControl = control)

lda_results <- data.frame(obs = test$W.Hnd[complete.cases(test)], lda = predict(lda_tune, newdata = test))
rf_results <- data.frame(obs = test$W.Hnd[complete.cases(test)], lda = predict(rf_tune, newdata = test))