2
votes

I'm using e1071 svm function to classify my data. I tried two different ways to LOOCV. First one is like that,

svm.model <- svm(mem ~ ., data, kernel = "sigmoid", cost = 7, gamma = 0.009, cross = subSize)
svm.pred = data$mem
svm.pred[which(svm.model$accuracies==0 & svm.pred=='good')]=NA
svm.pred[which(svm.model$accuracies==0 & svm.pred=='bad')]='good'
svm.pred[is.na(svm.pred)]='bad'
conMAT <- table(pred = svm.pred, true = data$mem)
summary(svm.model)

I typed cross='subject number' to make LOOCV, but the result of classification is different from my manual version of LOOCV, which is like...

for (i in 1:subSize){
  data_Tst <- data[i,1:dSize]
  data_Trn <- data[-i,1:dSize]
  svm.model1 <- svm(mem ~ ., data = data_Trn, kernel = "linear", cost = 2, gamma = 0.02)
  svm.pred1 <- predict(svm.model1, data_Tst[,-dSize])
  conMAT <- table(pred = svm.pred1, true = data_Tst[,dSize])
  CMAT <- CMAT + conMAT
  CORR[i] <- sum(diag(conMAT))
}

In my opinion, through LOOCV, accuracy should not vary across many runs of code because SVM makes model with all the data except one and does it until the end of the loop. However, with the svm function with argument 'cross' input, the accuracy differs across every runs of code.

Which way is more accurate? Thanks for read this post! :-)

1

1 Answers

0
votes

You are using different hyper-parameters (cost, gamma) and different kernels (linear, sigmoid). If you want identical results, then these should be the same each run.

Also, it depends how Leave One Out (LOO) is implemented:

  1. Does your LOO method leave one out randomly or as a sliding window over the dataset?

  2. Does your LOO method leave one out from one class at a time or both classes at the same time?

  3. Is the training set always the same, or are you using a randomisation procedure before splitting between a training and testing set (assuming you have a separate independent testing set)? In which case, the examples you are cross-validating would change each run.