In order to train the support vector machine, we must determine various parameters.
For example, there are parameters such as cp and minsplit.
I am using cross validatio right now , to find these parameters, and I got cp=0.02.
Below is the code for that:
library(caTools)
set.seed(3000)
spl = sample.split(dat$Incident.Category, SplitRatio = 0.8)
Train = subset(dat, spl==TRUE)
Test = subset(dat, spl==FALSE)
library(caret)
library(e1071)
# Define cross-validation experiment
numFolds = trainControl( method = "cv", number = 10 )
cpGrid = expand.grid( .cp = seq(0.01,0.5,0.01))
train(Incident.Category ~ Working.Condition + Observation.Type +Injury.Potential.Score + Equipment.Damage.Score + Safety.Standards + Incident.Type, data = Train, method = "rpart", trControl = numFolds, tuneGrid = cpGrid )
CartMOdel = rpart(Incident.Category ~ Working.Condition + Observation.Type + Injury.Potential.Score+ Equipment.Damage.Score + Safety.Standards + Incident.Type,data = Train, method="class", cp = 0.02)
Now I want to know how to use GA to optimize these parameters. My data is categorical, so I am also confused about how to choose the fitness function.