I input data in LIBSVM format like this into a SciPy sparse matrix. The training set is multi-label and multi-class as described in this question I asked: Understanding format of data in scikit-learn
from sklearn.datasets import load_svmlight_file
X,Y = load_svmlight_file("train-subset100.csv.csv", multilabel = True, zero_based = True)
Then I employ OneVsRestClassifier
with LinearSVC
to train the data.
clf = OneVsRestClassifier(LinearSVC())
clf.fit(X, Y)
Now when I want to test the data, I do the following.
X_, Y_ = load_svmlight_file("train-subset10.csv", multilabel = True, zero_based = False)
predicted = clf.predict(X_)
Here it gives me error. I dump the traceback here as it is.
Traceback (most recent call last):
File "test.py", line 36, in
predicted = clf.predict(X_)
File "/usr/lib/pymodules/python2.7/sklearn/multiclass.py", line 151, in predict
return predict_ovr(self.estimators_, self.label_binarizer_, X)
File "/usr/lib/pymodules/python2.7/sklearn/multiclass.py", line 67, in predict_ovr
Y = np.array([_predict_binary(e, X) for e in estimators])
File "/usr/lib/pymodules/python2.7/sklearn/multiclass.py", line 40, in _predict_binary
return np.ravel(estimator.decision_function(X))
File "/usr/lib/pymodules/python2.7/sklearn/svm/base.py", line 728, in decision_function
self._check_n_features(X)
File "/usr/lib/pymodules/python2.7/sklearn/svm/base.py", line 748, in _check_n_features
X.shape[1]))
ValueError: X.shape[1] should be 3421, not 690.
I do not understand why is it looking for more features when the input format is a sparse matrix? How can I get it to predict test labels correctly?