I have a data frame of about 500 rows and 170 columns. I am attempting to run a classification model with svm from the e1071 package. The classification variable is called 'SEGMENT', a factor variable with 6 levels. There are three other factor variables in the data frame, and the rest are numeric.
data <- my.data.frame
# Split into training and testing sets, training.data and testing.data
.
.
.
fit <- svm(SEGMENT ~ ., data = training.data, cost = 1, kernel = 'linear',
+ probability = T, type = 'C-classification')
The model runs fine.
Parameters:
SVM-Type: C-classification
SVM-Kernel: linear
cost: 1
gamma: 0.0016
Number of Support Vectors: 77
( 43 2 19 2 2 9 )
Number of Classes: 6
Levels:
EE JJ LL RR SS WW
The problem arises when I try to test the model on data.testing, which is structured exactly like the training set:
x <- predict(fit, testing.data, decision.values = T, probability = T)
And then things blow up rather spectacularly:
Error in predict.svm(fit, newdata = testing, decision.values = T, probability = T) :
test data does not match model !
Ideas are most welcome.
str(testing.data)
. My guess is that the factor levels will be different. I'm also guessing that if you search on that error text in SO htat you will find this has been asked and answered several times before. – IRTFM