0
votes

I have two multi-class data sets with 5 labels, one for training, and the other for cross validation. These data sets are stored as .csv files, so they act as a control in this experiment.

I have a C++ wrapper for libsvm, and the MATLAB functions for libsvm.

For both C++ and MATLAB: Using a C-type SVM with an RBF kernel, I iterate over 2 lists of C and Gamma values. For each parameter combination, I train on the training data set and then predict the cross validation data set. I store the accuracy of the prediction in a 2D map which correlates to the C and Gamma value which yielded the accuracy.

I've recreated different training and cross validation data sets many, many times. Each time, the C++ and MATLAB accuracies are different; sometimes by a lot! Mostly MATLAB produces higher accuracies, but sometimes the C++ implementation is better.

What could be accounting for these differences? The C/Gamma values I'm trying are the same, as are the remaining SVM parameters (default).

1

1 Answers

4
votes

There should be no significant differences as both C and Matlab codes use the same svm.c file. So what can be the reason?

  • implementation error in your code(s), this is unfortunately the most probable one
  • used wrapper has some bug and/or use other version of libsvm then your matlab code (libsvm is written in pure C and comes with python, Matlab and java wrappers, so your C++ wrapper is "not official") or your wrapper assumes some additional default values, which are not default in C/Matlab/Python/Java implementations
  • you perform cross validation in somewhat randomized form (shuffling the data and then folding, which is completely correct and reasonable, but will lead to different results in two different runs)
  • There is some rounding/conversion performed during loading data from .csv in one (or both) of your codes which leads to inconsistencies (really not likely to happen, yet still possible)