TLDR: scikit's roc_curve
function is only returning 3 points for a certain dataset.
Why could this be, and how do we control how many points to get back?
I'm trying to draw a ROC curve, but consistently get a "ROC triangle".
lr = LogisticRegression(multi_class = 'multinomial', solver = 'newton-cg')
y = data['target'].values
X = data[['feature']].values
model = lr.fit(X,y)
# get probabilities for clf
probas_ = model.predict_log_proba(X)
Just to make sure the lengths are ok:
print len(y)
print len(probas_[:, 1])
Returns 13759 on both.
Then running:
false_pos_rate, true_pos_rate, thresholds = roc_curve(y, probas_[:, 1])
print false_pos_rate
returns [ 0. 0.28240129 1. ]
If I call threasholds, I get array([ 0.4822225 , -0.5177775 , -0.84595197]) (always only 3 points).
It is therefore no surprise that my ROC curve looks like a triangle.
What I cannot understand is why scikit's roc_curve
is only returning 3 points. Help hugely appreciated.
probas_[:,1]
? Although it has length of 13759, it may only contain 3 values... – pyan[print pd.Series(probas_[:,1]).unique()]
, and indeed only 2 uniques ([-0.84595197 -0.5177775 ]
) were returned – sapo_cosmico