3
votes

In order to learn the support vector machine, we must determine various parameters.

For example, there are parameters such as cost and gamma.

I am trying to determine sigma and gamma parameters of SVM Using "GA" package and "kernlab" package of R.

I use accuracy as the evaluation function of the genetic algorithm.

I have created the following code, and I ran it.

library(GA) 
library(kernlab) 
data(spam) 
index <- sample(1:dim(spam)[1]) 
spamtrain <- spam[index[1:floor(dim(spam)[1]/2)], ] 
spamtest <- spam[index[((ceiling(dim(spam)[1]/2)) + 1):dim(spam)[1]], ] 

f <- function(x) 
{ 
x1 <- x[1] 
x2 <- x[2] 
filter <- ksvm(type~.,data=spamtrain,kernel="rbfdot",kpar=list(sigma=x1),C=x2,cross=3) 
mailtype <- predict(filter,spamtest[,-58]) 
t <- table(mailtype,spamtest[,58]) 
return(t[1,1]+t[2,2])/(t[1,1]+t[1,2]+t[2,1]+t[2,2]) 
} 

GA <- ga(type = "real-valued", fitness = f, min = c(-5.12, -5.12), max = c(5.12, 5.12), popSize = 50, maxiter = 2) 
summary(GA) 
plot(GA) 

However, When I call the GA function,the following error is returned.

"No Support Vectors found. You may want to change your parameters"

I can not understand why the code is bad.

1
If you only have two parameters to optimise, you don't really need a GA. Define a grid of plausible values, and test your model's fit over the grid.Hong Ooi
Hello, Hong Ooi! Thank you for the reply immediately. The above code is an ad hoc example . I plan to use my own custom kernel for svm finally. I plan to use five or six parameters for svm. If I use the grid search, the number of combinations is enormous. So, I want to use GA to optimize parameters.Dai Koga
Are you sure that negative values for sigma and C are meaningful?Vincent Zoonekynd
Hey, Vincent Zoonekynd! Thank you for kind reply. I am sorry. The above settings of values are incomplete. I gave as an example of code only. I will specify the exact values for the actual analysis.Dai Koga
@VincentZoonekynd I am sure they are not.Marc Claesen

1 Answers

5
votes

Using GA for SVM parameters is not a good idea - it should be sufficient to just do a regular grid search ( two for loops, one for C and one for gamma values).

In Rs library e1071 (which also provides SVMs) there is a methodtune.svm` which looks for best parameters using a grid search.

Example

data(iris)
obj <- tune.svm(Species~., data = iris, sampling = "fix", 
gamma = 2^c(-8,-4,0,4), cost = 2^c(-8,-4,-2,0))
plot(obj, transform.x = log2, transform.y = log2)
plot(obj, type = "perspective", theta = 120, phi = 45)

Which also shows one important thing - you should look for a good C and gamma values in a geometric manner, so eg. 2^x for x in {-10,-8,-6,-6,-4,-2,0,2,4}.

GA is an algorithm for meta optimisation, where the parameters space is huge, and there is no easy correlation between parameters and the optimising function. It requires tuning of much more parameters then SVM (number of generations, size of the population, mutation probability, crossing probability, mutation operator, crossing operator ...) so it completely useless approach here.

And of course - as it was earlier stated in comments - C and Gamma have to be strictly positive.

For more details about using e1071 take a look at the CRAN document: http://cran.r-project.org/web/packages/e1071/e1071.pdf