Naive bayes classifier calculation

Question

I'm trying to use naive Bayes classifier to classify my dataset.My questions are:

1- Usually when we try to calculate the likehood we use the formula:

P(c|x)= P(c|x1) * P(c|x2)*...P(c|xn)*P(c) . But in some examples it says in order to avoid getting very small results we use P(c|x)= exp(log(c|x1) + log(c|x2)+...log(c|xn) + logP(c)). can anyone explain more to me the difference between these two formula and are they both used to calculate the "likehood" or the sec one is used to calculate something called "information gain".

2- In some cases when we try to classify our datasets some joints are null. Some ppl use "LAPLACE smoothing" technique in order to avoid null joints. Doesnt this technique influence on the accurancy of our classification?.

Thanks in advance for all your time. I'm just new to this algorithm and trying to learn more about it. So is there any recommended papers i should read? Thanks alot.

NaiveBayesian NaiveBayesian · Accepted Answer · 2014-07-30T18:09:08

I'll take a stab at your first question, assuming you lost most of the P's in your second equation. I think the equation you are ultimately driving towards is:

log P(c|x) = log P(c|x1) + log P(c|x2) + ... + log P(c)

If so, the examples are pointing out that in many statistical calculations, it's often easier to work with the logarithm of a distribution function, as opposed to the distribution function itself.

Practically speaking, it's related to the fact that many statistical distributions involve an exponential function. For example, you can find where the maximum of a Gaussian distribution K*exp^(-s_0*(x-x_0)^2) occurs by solving the mathematically less complex problem (if we're going through the whole formal process of taking derivatives and finding equation roots) of finding where the maximum of its logarithm K-s_0*(x-x_0)^2 occurs.

This leads to many places where "take the logarithm of both sides" is a standard step in an optimization calculation.

Also, computationally, when you are optimizing likelihood functions that may involve many multiplicative terms, adding logarithms of small floating-point numbers is less likely to cause numerical problems than multiplying small floating point numbers together is.

Naive bayes classifier calculation

1 Answers