0
votes

I have a binary classification problem where I have a few great features that have the power to predict almost 100% of the test data because the problem is relatively simple.

However, as the nature of the problem requires, I have no luxury to make mistake(let's say) so instead of giving a prediction I am not sure of, I would rather have the output as probability, set a threshold and would be able to say, "if I am less than %95 sure, I will call this "NOT SURE" and act accordingly". Saying "I don't know" rather than making a mistake is better.

So far so good.

For this purpose, I tried Gaussian Bayes Classifier(I have a cont. feature) and Logistic Regression algorithms, which provide me the probability as well as the prediction for the classification.

Coming to my Problem:

  • GBC has around 99% success rate while Logistic Regression has lower, around 96% success rate. So I naturally would prefer to use GBC. However, as successful as GBC is, it is also very sure of itself. The odds I get are either 1 or very very close to 1, such as 0.9999997, which makes things tough for me, because in practice GBC does not provide me probabilities now.

  • Logistic Regression works poor, but at least gives better and more 'sensible' odds.

As nature of my problem, the cost of misclassifying is by the power of 2 so if I misclassify 4 of the products, I lose 2^4 more (it's unit-less but gives an idea anyway).

In the end; I would like to be able to classify with a higher success than Logistic Regression, but also be able to have more probabilities so I can set a threshold and point out the ones I am not sure of.

Any suggestions?

Thanks in advance.

1

1 Answers

1
votes

If you have enough data, you can simply retune the probabilities. For example, given the "predicted probability" output of your gaussian classifier, you can go back through (on a held out dataset) and at different prediction values, estimate the probability of the positive class.

Further, you can simply set up an optimization on your holdout set to determine the best threshold(without actually estimating the probability). Since it's one dimensional, you shouldn't even need to do anything fancy for optimization-- test like 500 different thresholds and pick the one which minimizes the costs associated with misclassifications.