Large training and testing data in libsvm

Question

I'm using Libsvm in a 5x2 cross validation to classify a very huge amount of data, that is, I have 47k samples for training and 47k samples for testing in 10 different configurations.

I usually use the Libsvm's script easy.py to classify the data, but it's taking so long, I've been waiting for results for more than 3 hours and nothing, and I still have to repeat this procedure more 9 times!

does anybody know how to use the libsvm faster with a very huge amount of data? does the C++ Libsvm functions work faster than the python functions?

Fred Foo Fred Foo · Accepted Answer · 2013-07-03T20:42:56

LibSVM's training algorithm doesn't scale up to this kind of datasets; it takes O(n³) time in the worst case and around O(n²) on typical ones. The first thing to try is scaling your datasets properly; if it still doesn't work, switch to

a linear SVM (or logistic regression) using, e.g., Liblinear, Vowpal Wabbit or Wapiti
a more scalable kernel SVM implementation such as LaSVM

Large training and testing data in libsvm

3 Answers