0
votes

We know that the work flow of logistic regression is it first gets the probability based on some equations and uses default cut-off for classification.

So, I want to know if it is possible to change the default cutoff value(0.5) to 0.75 as per my requirement. If Yes, can someone help me with the code either in R or Python or SAS. If No, can someone provide if with relevant proofs.

In my process of finding the answer for this query, i found that :- 1.) We can find the optimal cutoff value that can give best possible accuracy and build the confusion matrix accordingly :-

R code to find optimul cutoff and build confusion matrix :- library(InformationValue) optCutOff <- optimalCutoff(testData$ABOVE50K, predicted)[1] confusionMatrix(testData$ABOVE50K, predicted, threshold = optCutOff)

Misclassification Error :- misClassError(testData$ABOVE50K, predicted, threshold = optCutOff)

Note :- We see that the cutoff value is changed while calculating the confusion matrix, but not while building the model. Can someone help me with this.

Reference link :- http://r-statistics.co/Logistic-Regression-With-R.html

1

1 Answers

2
votes
from sklearn.linear_model import LogisticRegression
lr=LogisticRegression()

lr.fit(x_train, y_train)

we find first use

lr.predict_proba(x_test)

to get the probability in each class, for example, first column is probability of y=0 and second column is probability of y=1.

# the probability of being y=1
prob1=lr.predict_proba(X_test)[:,1]

If we use 0.25 as the cutoff value, then we predict like below

predicted=[1 if i > 0.25 else 0 for i in prob1]