Let me start by saying that I have read many posts on Cross Validation and it seems there is much confusion out there. My understanding of that it is simply this:
- Perform k-fold Cross Validation i.e. 10 folds to understand the average error across the 10 folds.
- If acceptable then train the model on the complete data set.
I am attempting to build a decision tree using rpart
in R and taking advantage of the caret
package. Below is the code I am using.
# load libraries
library(caret)
library(rpart)
# define training control
train_control<- trainControl(method="cv", number=10)
# train the model
model<- train(resp~., data=mydat, trControl=train_control, method="rpart")
# make predictions
predictions<- predict(model,mydat)
# append predictions
mydat<- cbind(mydat,predictions)
# summarize results
confusionMatrix<- confusionMatrix(mydat$predictions,mydat$resp)
I have one question regarding the caret train application. I have read A Short Introduction to the caret Package train section which states during the resampling process the "optimal parameter set" is determined.
In my example have I coded it up correctly? Do I need to define the rpart
parameters within my code or is my code sufficient?