1
votes

I'm relatively new to machine learning and would like some help in the following:

I ran a Support Vector Machine Classifier (SVC) on my data with 10-fold cross validation and calculated the accuracy score (which was around 89%). I'm using Python and scikit-learn to perform the task. Here's a code snippet:

def get_scores(features,target,classifier):
  X_train, X_test, y_train, y_test =train_test_split(features, target , 
    test_size=0.3)
    scores = cross_val_score(
    classifier,
    X_train,
    y_train,
    cv=10,
    scoring='accuracy',
    n_jobs=-1)
 return(scores)

get_scores(features_from_df,target_from_df,svm.SVC())

Now, how can I use my classifier (after running the 10-folds cv) to test it on X_test and compare the predicted results to y_test? As you may have noticed, I only used X_train and y_train in the cross validation process.

I noticed that sklearn have cross_val_predict: http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_predict.html should I replace my cross_val_score by cross_val_predict? just FYI: my target data column is binarized (have values of 0s and 1s).

If my approach is wrong, please advise me with the best way to proceed with.

Thanks!

2

2 Answers

0
votes

You're almost there:

# Build your classifier
classifier = svm.SVC()

# Train it on the entire training data set
classifier.fit(X_train, y_train)

# Get predictions on the test set
y_pred = classifier.predict(X_test)

At this point, you can use any metric from the sklearn.metrics module to determine how well you did. For example:

from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, y_pred))
7
votes

You only need to split your X and y. Do not split the train and test.

Then you can pass your classifier in your case svm to the cross_val_score function to get the accuracy for each experiment.

In just 3 lines of code:

clf = svm.SVC(kernel='linear', C=1)
scores = cross_val_score(clf, X, y, cv=10)
print scores