0
votes

I ran this simple naive bayes program:

import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
Y = np.array([1, 1, 1, 2, 2, 2])
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(X, Y)
print(clf.predict([[-0.8, -1],[-0.9, -1]]))

and the result I got is:

[1 1]

The [-0.8, -1] is classified to 1, and the [-0.9, -1] is classified to 2. If I know my data all came from the same class, i.e., [[-0.8, -1],[-0.9, -1]] came from the same class, is there a way for scikit-learn's naive bayes classifier to classify this data as a whole (and give me [1] as a result in this case), rather than classifying every data point individually.

1
In your example above, both inputs were actually classified as class 1 (the second was not classified as class 2).bogatron

1 Answers

3
votes

The naive Bayes classifier classifies each input individually (not as a group). If you know that all of the inputs belong to the same (but unknown) class, then you need to do some additional work to get your result. One option is to select the class with the greatest count in the result from clf.predict but that might not work well if you are only have two instances in the group.

Another option would be to call predict_proba for the GaussianNB classifier, which will return the probabilities of all classes for each of the inputs. You can then use the individual probabilities (e.g., you could just sum them for each class) to decide how you want to classify the group.

You could even combine the two approaches - Use predict and select the class with the highest count but use predict_proba to break a tie.