Why is my detection score high inspite of obvious misclassifications during prediction?

Question

I am working on an intrusion classification problem using NSL-KDD dataset. I used 10 features (out of 42) for training after applying Recursive feature elimination technique using Random Forest Classifier as the estimator parameter and Gini index as criterion for splitting Decision tree. After training the classifier, I use same classifier to predict the classes of test data. My cross validation score (Accuracy, precision, recall, f-score) using cross_val_score of sklearn gave above 99 % scores for all the four scores. But plotting the confusion matrix showed otherwise with higher values seen in False positive and False negative values. Claerly, they are not matching with accuracy and all these scores. Where did I do wrong ?

# Train set contain X_train (dataframe of features) and Y_train (series 
# of target labels)
# Test set contain X_test and Y_test

# Classifier variable
clf = RandomForestClassifier(n_estimators = 10, criterion = 'gini')

#Training
clf.fit(X_train, Y_train)

# Testing
Y_pred = clf.predict(X_test)
pandas.crosstab(Y_test, Y_pred, rownames = ['Actual'], colnames = 
['Predicted'])

# Scoring
accuracy = cross_val_score(clf, X_test, Y_test, cv = 10, scoring = 
'accuracy')
print("Accuracy: %0.5f (+/- %0.5f)" % (accuracy.mean(), accuracy.std() * 
2))
precision = cross_val_score(clf, X_test, Y_test, cv = 10, scoring = 
'precision_weighted')
print("Precision: %0.5f (+/- %0.5f)" % (precision.mean(), precision.std() 
* 2))
recall = cross_val_score(clf, X_test, Y_test, cv = 10, scoring = 
'recall_weighted')
print("Recall: %0.5f (+/- %0.5f)" % (recall.mean(), recall.std() * 2))
f = cross_val_score(clf, X_test, Y_test, cv = 10, scoring = 'f1_weighted')
print("F-Score: %0.5f (+/- %0.5f)" % (f.mean(), f.std() * 2))

I got accuracy, precision, recall and f-score of

Accuracy 0.99825 
Precision 0.99826
Recall 0.99825
F-Score 0.99825

However, the confusion matrix showed otherwise

Predicted 9670    41
Actual    5113    2347

Am I training the whole thing wrong or is it just misclassification problem from poor feature selection?

Aniket Gaikwad Aniket Gaikwad · Accepted Answer · 2019-11-17T04:17:26

Your predicted values are stored in y_pred.

accuracy_score(y_test,y_pred)

Just check whether this works...

Why is my detection score high inspite of obvious misclassifications during prediction?

2 Answers