1
votes

Sorry for some grammatical mistakes and misuse of words.

I am currently working with text classification, trying to classify the email.

After my research, i found out Multinomial Naive Bayes and Bernoulli Naive Bayes is more often used for text classification. Bernoulli just cares about whether the word happens or not. Multinomial cares about the number of occurrence of the word.

For Gaussian Naive Bayes, it's usually been used for continuous data and data with normal distribution, eg: height,weight But what is the reason that we don't use Gaussian Naive Bayes for text classification? Any bad things will happen if we apply it to text classification?

2

2 Answers

0
votes

We use algorithm based on the kind of dataset we have.Bernoulli Naive bayes is good at handling boolean/binary attributes,while Multinomial Naive bayes is good at handling discrete values and Gaussian naive bayes is good at handling continuous values. Consider three scenarios 1)consider a datset which has columns like has_diabetes,has_bp,has_thyroid and then you classify the person as healthy or not.In such a scenario,Bernoulli NB will work well. 2)consider a dataset that has marks of various students of various subjects and you want to predict,whether the student is clever or not.Then in this case multinomial NB will work fine. 3)consider a dataset that has weight of students and you are predicting height of them,then GaussiaNB will well in this case.

-1
votes

Bayes Classifier use probabilistic rules, the three ones you have mentioned related to the following rules:

You have to select the probability rule to use regarding the data you have (or try them all).


I think that what you have read on website or in research papers relates to the fact that email data usually follow a Bernoulli or Multinomial distribution. You can and I encourage you try with the Gaussian distribution, you should figure out very rapidly if you data can be fitted in a Gaussian distribution.

However, I would advise that you read the links above, you will have a better understanding of your work if you have a feeling of the reasons why the solution A or B works better than solution C.