Too Much Data for SVM?

Question

So I'm running a SVM classifier (with a linear kernel and probability false) from sklearn on a dataframe with about 120 features and 10,000 observations. The program takes hours to run and keeps crashing due to exceeding computational limits. Just wondering if this dataframe is perhaps too large?

That should be ok for a linear kernel (at least with LinearSVC; not sure about SVC with kernel=linear). Show us the code! — sascha

lejlot lejlot · Accepted Answer · 2016-08-03T08:09:28

In short no, this is not too big at all. Linear svm can scale much further. The libSVC library on the other hand cannot. The good thing, even in scikit-learn you do have large scale svm implementation - LinearSVC which is based on liblinear. You can also solve it using SGD (also available in scikitlearn) which will converge for much bigger datasets as well.

Too Much Data for SVM?

3 Answers