0
votes

So, I am trying to train a model and test it using a random forest regression. My response variable is a numeric and I have 23 other variables which are a mix of numeric and characters. I am using the following block of code:

library(e1071)
library(dplyr)
library(class)
library(caret)
library(kernlab)

data=read.csv(choose.files())


set.seed(1)
mydata=data
n=dim(mydata)[1]
p=dim(mydata)[2]-1
x=mydata[,-3]
y=mydata[,3]

n_train=35
n_test=9

random_order=sample(n)
test_index=random_order[1:n_test]
train_index=random_order[-(1:n_test)]
y_train=y[train_index]
y_test=y[test_index]
x_train=x[train_index,]
x_test=x[test_index,]

traindata=data.frame(x=x_train,y=(y_train))
testdata = data.frame(x=x_test,y=(y_test))

fitControl <- trainControl(## 10-fold CV
  method = "repeatedcv",classProbs=TRUE, 
  number = 10,
  ## repeated ten times
  repeats = 10)

set.seed(1)
newrf=train(y ~ ., data = traindata , method = "rf", 
             trControl = fitControl)

newrf 
bestmodel_rf= newrf$finalModel
ypredcaret=predict(bestmodel_rf, newdata = testdata)
table(predict=ypredcaret, truth=y_test)
plot(newrf)
bestmodel_rf

I am getting the following error:

Warning message: In train.default(x, y, weights = w, ...) : cannnot compute class probabilities for regression Warning message: In train.default(x, y, weights = w, ...) : cannnot compute class probabilities for regression

1

1 Answers

3
votes

You've specified classProbs=T in trainControl, which indicates class probabilities should be computed for a classification model (where the response variable consists of discrete class labels). However, that argument setting conflicts with your numeric response variable (which indicates a regression model will be trained), resulting in the error message that class probabilities cannot be computed for regression.

Since your description and numeric response variable indicate this is a regression problem, removing classProbs=T (the default setting is classProbs=F) from your code should address the error you're getting.