0
votes

I am using the library e1071 to train SVM model in R, where i change the cost function and observe the number of resulting Support vectors.

library("e1071")
library("mlbench")
data(Glass, package="mlbench")
svm.model <- svm(Type ~ ., data = Glass, cost = 0.00100, gamma= 1)
sum(svm.model$nSV)
#[1] 208
svm.model <- svm(Type ~ ., data = Glass, cost = 1, gamma= 1)
sum(svm.model$nSV)
#[1] 183
svm.model <- svm(Type ~ ., data = Glass, cost = 100000, gamma= 1)
sum(svm.model$nSV)
#[1] 172

My question is the following: Is the cost parameters here equivalent to the C parameter in the dual Lagrange formulation of the soft margin SVM? If those parameters are the same, then should not we observe an increasing number of support vectors?

"However, it is critical here, as in any regularization scheme, that a proper value is chosen for C, the penalty factor. If it is too large, we have a high penalty for nonseparable points and we may store many support vectors and overfit. If it is too small, we may have underfitting." Alpaydin (2004), page 224

The presented example shows that the greater the cost parameter is the less support vectors we get. So what is wrong here?

[EDIT 1] I Exchange some emails with the editors of the mentioned library and he gave a counterexample.

"Basically, yes, but this is not linear, try: "

N = sapply(1:1000, function(i) svm(Species ~ ., data = iris, cost = i)$tot.nSV) plot(N)

2
What's the performance of each model? if gamma =1 result in a too wide radius for your data points, then they are equally bad and since the cost is too high and SVM an select a better combination of SVs it would tend to perform worsePedrom
But Gamma=1/sigma^2 , should not this lead to a relatively small radius? Anyways, changing Gamma would not change the behaviour i am describing!Shaki
Well it depends.. Depending on the scale of your data might be big or small. Are you scaling your data? What's the range of values? For the RBF kernel, is the combination of C and gamma what controls the number of SVsPedrom

2 Answers

1
votes

I recieved this answer from the library creator:

Basically, yes, but this is not linear, try:

N = sapply(1:1000, function(i) svm(Species ~ ., data = iris, cost = i)$tot.nSV)

plot(N)

0
votes

Your intuition is absolutely correct but you need to see that your classification algorithm is dancing in an infinite dimension space about you know nothing. Changing C from 1 to 1000 may take the classification boundary --literally-- in a different part of the universe. Try to do you experiment in a smaller range of C and see how it changes. I varied C from 256000, 128000, 64000, ... 32, 16, 8, 4, 2 (halving every time) and found interesting behavior around C = 15, 8, 4 etc. You see there are hundreds of points eligible to be support vectors and the surface can curve any which way you want. So increasing "C implies more support vector" rule will be true only statistically. Exact number of SVs will vary on how the points are laid and how the surface curves.