1
votes

I am trying to train a random forest model, but I'm getting the following error. Is there a different setting I need to use for a classification model to resolve the RMSE issue? I tried converting "good" to a factor but that threw a new error.

Error:

Error in train.default(x, y, weights = w, ...) : 
  Metric RMSE not applicable for classification models 
5 stop(paste("Metric", metric, "not applicable for classification models")) 
4 train.default(x, y, weights = w, ...) 
3 train(x, y, weights = w, ...) 
2 train.formula(good ~ ., data = train, method = "rf", trControl = trainControl(method = "cv", 
    5), ntree = 251) 
1 train(like ~ ., data = train, method = "rf", trControl = trainControl(method = "cv", 
    5), ntree = 251) 

The code I'm using to train the model is below. I'm trying to classify the records in the dataset as "good" based on the values in variables 1-3.

Code:

set.seed(13518) # For reproducibile purpose
inTrain <- createDataPartition(SampleTestData$good, p=0.70, list=F)
train <- SampleTestData[inTrain, ]
test_train <- SampleTestData[-inTrain, ]

if(!exists("model1"))
{
  model1 <- train(good ~ ., data=train, method="rf", trControl=trainControl(method="cv", 5), ntree=251)
}

I've included some sample data below. I used dput to output the data as the text below.

Data:

structure(list(good = c("True", "True", "True", "False", "False", 
"True", "True", "True", "True", "False", "True", "True", "True", 
"True", "False", "False", "False", "True", "False", "False", 
"True", "False", "True", "False", "True", "False", "True", "True", 
"False", "False", "True", "True", "False", "True", "True", "True", 
"True", "False", "False", "False", "False", "True", "False", 
"True", "True", "True", "False", "True", "False", "True", "False", 
"True", "True", "True", "False", "False", "True", "False", "True"
), variable1 = c("TRUE", "TRUE", "TRUE", "TRUE", 
"FALSE", "TRUE", "FALSE", "TRUE", "TRUE", "FALSE", "FALSE", "TRUE", 
"TRUE", "TRUE", "TRUE", "TRUE", "FALSE", "TRUE", "FALSE", "TRUE", 
"TRUE", "FALSE", "TRUE", "TRUE", "TRUE", "FALSE", "FALSE", "FALSE", 
"TRUE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE", 
"TRUE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE", "FALSE", "FALSE", 
"TRUE", "TRUE", "TRUE", "FALSE", "TRUE", "TRUE", "FALSE", "TRUE", 
"TRUE", "TRUE", "TRUE", "TRUE", "TRUE", "FALSE", "TRUE"), variable2 = c("TRUE", 
"TRUE", "TRUE", "TRUE", "FALSE", "TRUE", "TRUE", "TRUE", "TRUE", 
"FALSE", "TRUE", "TRUE", "TRUE", "FALSE", "TRUE", "TRUE", "FALSE", 
"TRUE", "FALSE", "TRUE", "FALSE", "FALSE", "TRUE", "TRUE", "TRUE", 
"FALSE", "FALSE", "FALSE", "TRUE", "TRUE", "FALSE", "TRUE", "TRUE", 
"TRUE", "TRUE", "FALSE", "FALSE", "TRUE", "TRUE", "FALSE", "FALSE", 
"TRUE", "FALSE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", 
"TRUE", "TRUE", "FALSE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE", 
"FALSE", "FALSE", "TRUE"), variable3 = c("FALSE", "FALSE", 
"FALSE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", 
"FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", 
"FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", 
"FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", 
"FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", 
"FALSE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", 
"FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", 
"TRUE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", 
"TRUE")), .Names = c("good", "variable1", "variable2", 
"variable3"), class = "data.frame", row.names = c(5078L, 
5087L, 5366L, 5568L, 7017L, 8123L, 8145L, 8525L, 11777L, 12355L, 
12586L, 12675L, 14912L, 15503L, 15530L, 15533L, 15598L, 15634L, 
15749L, 15842L, 16216L, 16718L, 16744L, 16792L, 17928L, 20351L, 
20417L, 21083L, 22382L, 23698L, 23807L, 23879L, 23900L, 30431L, 
30897L, 31084L, 31803L, 32007L, 32806L, 37487L, 37656L, 38284L, 
38291L, 38471L, 38786L, 40303L, 40724L, 41222L, 41248L, 41837L, 
42994L, 44423L, 45216L, 46233L, 47012L, 50446L, 52429L, 53197L, 
54590L))
1

1 Answers

3
votes

Converting good to a factor actually seems to solve the problem. All of the variables in the data set have values of TRUE or FALSE and are character type. So why does random forest default to regression instead of classifier for this case?

Code that solved the issue:

SampleTestData$good = as.factor(SampleTestData$good)