I am totally new to machine learning and I'm trying to use scikit-learn to make a simple logistic regression model with 1 input variable (X) and a binary outcome (Y). My data consists of 325 samples, with 39 successes and 286 failures. The data was split into a training and test (30%) set.
My goal is actually to obtain the predicted probabilities of success for any given X based on my data, not for classification prediction per se. That is, I will be taking the predicted probabilities for use in a separate model I'm building and won't be using the logistic regression as a classifier at all. So it's important that the predicted probabilities actually fit the data.
However, I am having some trouble understanding whether or not my model is a good fit to the data, or if the computed probabilities are actually accurate.
I am getting the following metrics:
Classification accuracy: metrics.accuracy_score(Y_test, predicted) = 0.92. My understanding of this metric is that the model has a high chance of making correct predictions, so it looks to me like the model is a good fit.
Log loss: cross_val_score(LogisticRegression(), X, Y, scoring='neg_log_loss', cv=10) = -0.26 This is probably the most confusing metric for me, and apparently the most important as it is the accuracy of the predicted probabilities. I know that the closer to zero the score is the better - but how close is close enough?
AUC: metrics.roc_auc_score(Y_test, probs[:, 1]) = 0.9. Again, this looks good, since the closer the ROC score is to 1 the better.
Confusion Matrix: metrics.confusion_matrix(Y_test, predicted) =
[ 88, 0] [8, 2]
My understanding here is that the diagonal gives the numbers of correct predictions in the training set so this looks ok.
Report: metrics.classification_report(Y_test, predicted) =
precision recall f1-score support 0.0 0.92 1.00 0.96 88 1.0 1.00 0.20 0.33 10 avg / total 0.93 0.92 0.89 98
According to this classification report, the model has good precision so it is a good fit. I am not sure how to interpret the recall or if this report is bad news for my model- the sklearn documentation states that the recall is a models ability to find all positive samples - so a score of 0.2 for a prediction of 1 would mean that it only finds the positives 20% of the time? That sounds like a really bad fit to the data.
I'd really appreciate if someone could clarify that I am interpeting these metrics the right way - and perhaps shed some light on whether my model is good or bogus. Also, if there are any other tests I could do to determine if the computed probabilities are accurate please let me know.
If these aren't good metric scores, I'd really appreciate some direction on where to go next in terms of improvement.
Thanks!!