How do i get prediction accuracy when testing unknown data on a saved model in Scikit-Learn?

Question

i have a model i have trained for binary classification, i now want to use it to predict unknown class elements.

     from sklearn.externals import joblib
     model = joblib.load('../model/randomForestModel.pkl')
     test_data = df_test.values # df_test is a dataframe with my test data
     output = model.predict(test_data[:,1:]) # this outputs the prediction either 1 or 0

I know how to get confusion_matrix, accuracy_score, classification_report given the training dataset, but in the case i do not have the train data. i would like to get something akin to this from weka:

       inst#     actual  predicted error prediction
           1        1:?        1:0       0.757

Is it possible in Scikit-learn? if so, how do i do it?

MathiasDesch MathiasDesch · Accepted Answer · 2015-03-18T16:59:30

Yes that's totally possible.

1) When trying to evaluate a model that you trained, you should use a testing set. A subset from the data that you have which you did not use to train in order to evaluate the capability of your model to predict new values. With this testing set you have the true value, so you can compare result of prediction. You can simple use, the train_test_split package or cross_validation.

2) Scikit-learn provide different metrics in order to evaluate a model. Once again you should use this metrics on a test set and not on your training set. This could lead to fake good result.

I dont see any reason why you would not be aware of the training set. But you can also use the _score method of model which you can parametrize as desire(F1 score, recall, precision).

In weka, I do not see what error prediction is. Can you explain ?

How do i get prediction accuracy when testing unknown data on a saved model in Scikit-Learn?

1 Answers