1
votes

I am an R beginner and i have to do a 5 or 10-fold cross validation in a random forest model. My problem is i have to do the cv manually and not with an package. What i want to do is: 1. Building k-folds with my training data 2. Choose my tuning parameter for example trees = c( 200, 400, 600) 3. Fit my model on k-1 folds and predict my values on the holdout set(validation set) 4. Then i want to evaluate my prediction on the holdout set and save the value.

my evaluation parameter should be AUC. I understand the theory but i have problems to do this in R. Have you an idea for my code? Thanks so much!!!

  • It is a classification Problem so as an alternative is think the iris data set would work here too.

  • I stuck there that i don't se how i can fit the model on k-1 folds and predict the values on each validation set. Do i set i= 1, i=2, and so on? This is what i have already, but it doesn't work:

training.x = iris[, 1:4]; training.y = iris[, 5];

training$folds = 
 sample(1:5,nrow(training), replace=TRUE)
myGrid <- expand.grid
( ntrees = c(500, 1000, 2000),
 mtry = c( 2, 4, 6, 8)
for (i in 1: 5){
newrf = randomForest(x = training.x[training$folds!=i,] , y = as.factor(training.y)
   ,tuneGrid = myGrid , importance = TRUE , do.trace = 10)  new.pr = predict(newrf, training.mt.X[training$folds==i,], id= i)
 err.vect[i] =roc.area(test, new.pr)$class
 print(paste("AUC for fold", i, ":", err.vect[i]))}``` 
1
When you say you "don't know how to start", neither do we. Is your data in R? Can you divide it into groups? Do you know how to subset data? Draw random numbers? Search Stack Overflow for the many many questions about cross validation in R?Gregor Thomas
Yes my data is in R and i can divide it into groups. I have already a training and a test set, and i know how to run my random forest. My main problem is i don't know how i can tune my hyper parameters and how i fit a model on k-1 fold, and the predict this values on the validation set.Theo87
What' don't you know about that? If you don't know how to tune hyper-parameters in principle, I'd suggest a textbook. Introduction to Statistical Learning in R is standard. See pages 181-194. That explains how to tune hyperparameters with cross validation. If you understand the algorithm and are having problems implementing it, see the link in my first comment which has an example, show what you have so far (maybe on a toy data set like mtcars), and try to explain where you're stuck.Gregor Thomas
Because the algorithm is fairly simple: you need to assign folds. Do you know how to draw random numbers? How to make a new column? Then you do a for loop, for each fold. Do you know how for loops work? In the loop, subset your data, all but one fold. Is that the issue? Fit the model on the subset. I think you know how to do this, because it's the one thing you show in the question. Then see how it performs on the hold-out fold. Is this where you get stuck? etc.Gregor Thomas
When you don't show anything I can't tell if we need to explain every detail like adding a column and subsetting data---or if you already have that I don't know why you're not showing it. This makes the question Too Broad. If we start from scratch and show you everything, you're asking for a 10-page tutorial, not a quick answer.Gregor Thomas

1 Answers

0
votes
# Code for 10 Fold Cross Validation: Adjust variables and data frame to yours
set.seed(17)
cv.error=rep(0,9)
for (i in 1:9){
  glm.fit=(medv~poly(lstat,i),data=Boston)
  cv.error[i]=cv.glm(Boston,glm.fit,K=10)$delta[1]
}
cv.error
plot(cv.error,type="b")