0
votes

I have a classification problem with 10 features and I have to predict 1 or 0. When I train the SVC model, with the train test split, all the predicted values for the test portion of the data comes out to be 0. The data has the following 0-1 count:

  • 0: 1875
  • 1: 1463

The code to train the model is given below:

from sklearn.svm import SVC
model = SVC()
model.fit(X_train, y_train)
pred= model.predict(X_test)
from sklearn.metrics import accuracy_score 
accuracy_score(y_test, pred)`

Why does it predict 0 for all the cases?

2

2 Answers

0
votes

The model predicts the more frequent class, even though the dataset is nor much imbalanced. It is very likely that the class cannot be predicted from the features as they are right now.

  • You may try normalizing the features.
  • Another thing you might want to try is to have a look at how correlated the features are with each other. Having highly correlated features might also prevent the model from converging.
  • Also, you might have chosen the wrong features.
0
votes

For a classification problem, it is always good to run a dummy classifiar as a starting point. This will give you an idea how good your model can be.

You can use this as a code:

from sklearn.dummy import DummyClassifier
dummy_classifier = DummyClassifier(strategy="most_frequent")
dummy_classifier.fit(X_train,y_train)
pred_dum= dummy_classifier.predict(X_test)
accuracy_score(y_test, pred_dum)

this will give you an accuracy, if you predict always the most frequent class. If this is for example: 100% , this would mean that you only have one class in your dataset. 80% means, that 80% of your data belongs to one class.

In a first step you can adjust your SVC:

model = SVC(C=1.0, kernel=’rbf’, random_state=42)

C : float, optional (default=1.0)Penalty parameter C of the error term.

kernel : Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’

This can give you a starting point.

On top you should run also a prediction for your training data, to see the comparison if you are over- or underfitting.

trainpred= model.predict(X_train)
accuracy_score(y_test, trainpred)