2
votes

May I know what is the nature of Naive Bayes from the NLTK? Is it Bernoulli, multinomial, Gaussian or any other variant? I read through the documentation but it seems too general.

I understand that scikit has 4 versions of Naive Bayes and only two of them are suitable for text processing.

As I am doing text processing I am finding a significant difference between the NLTK Naive Bayes and the scikit one.

1

1 Answers

1
votes

The NLTK Naive Bayes is of the Multinomial variety (typical with classification), the clue to this is that the Gaussian Naive Bayes is typically used on data that is continuous (not typical of text classification).

The official documentation for the NLTK Naive Bayes can be found here: https://www.nltk.org/_modules/nltk/classify/naivebayes.html

Key text sample-

A classifier based on the Naive Bayes algorithm.  In order to find the
probability for a label, this algorithm first uses the Bayes rule to
express P(label|features) in terms of P(label) and P(features|label):

|                       P(label) * P(features|label)
|  P(label|features) = ------------------------------
|                              P(features)

The algorithm then makes the 'naive' assumption that all features are
independent, given the label:

|                       P(label) * P(f1|label) * ... * P(fn|label)
|  P(label|features) = --------------------------------------------
|                                         P(features)

Rather than computing P(features) explicitly, the algorithm just
calculates the numerator for each label, and normalizes them so they
sum to one:

|                       P(label) * P(f1|label) * ... * P(fn|label)
|  P(label|features) = --------------------------------------------
|                        SUM[l]( P(l) * P(f1|l) * ... * P(fn|l) )