2
votes

So I'm running a SVM classifier (with a linear kernel and probability false) from sklearn on a dataframe with about 120 features and 10,000 observations. The program takes hours to run and keeps crashing due to exceeding computational limits. Just wondering if this dataframe is perhaps too large?

3
That should be ok for a linear kernel (at least with LinearSVC; not sure about SVC with kernel=linear). Show us the code! - sascha

3 Answers

2
votes

In short no, this is not too big at all. Linear svm can scale much further. The libSVC library on the other hand cannot. The good thing, even in scikit-learn you do have large scale svm implementation - LinearSVC which is based on liblinear. You can also solve it using SGD (also available in scikitlearn) which will converge for much bigger datasets as well.

0
votes

You could try changing the parameters for the algorithm.

Tips on practical use from the documentation.

You could try a different algorithm, here's a cheat sheet you might find helpful:

enter image description here

0
votes

The implementation is based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples.

The offical data about sklearn svm told the theshold is 10,000 samples so SGD could be a better try.