How to specify the prior probability for scikit-learn's Naive Bayes

Question

I'm using the scikit-learn machine learning library (Python) for a machine learning project. One of the algorithms I'm using is the Gaussian Naive Bayes implementation. One of the attributes of the GaussianNB() function is the following:

class_prior_ : array, shape (n_classes,)

I want to alter the class prior manually since the data I use is very skewed and the recall of one of the classes is very important. By assigning a high prior probability to that class the recall should increase.

However, I can't figure out how to set the attribute correctly. I've read the below topics already but their answers don't work for me.

How can the prior probabilities manually set for the Naive Bayes clf in scikit-learn?

How do I know what prior's I'm giving to sci-kit learn? (Naive-bayes classifiers.)

This is my code:

gnb = GaussianNB()
gnb.class_prior_ = [0.1, 0.9]
gnb.fit(data.XTrain, yTrain)
yPredicted = gnb.predict(data.XTest)

I figured this was the correct syntax and I could find out which class belongs to which place in the array by playing with the values but the results remain unchanged. Also no errors were given.

What is the correct way of setting the attributes of the GaussianNB algorithm from scikit-learn library?

Link to the scikit documentation of GaussianNB

Jianxun Li Jianxun Li · Accepted Answer · 2015-06-17T16:14:40

The GaussianNB() implemented in scikit-learn does not allow you to set class prior. If you read the online documentation, you see .class_prior_ is an attribute rather than parameters. Once you fit the GaussianNB(), you can get access to class_prior_ attribute. It is calculated by simply counting the number of different labels in your training sample.

from sklearn.datasets import make_classification
from sklearn.naive_bayes import GaussianNB


# simulate data with unbalanced weights
X, y = make_classification(n_samples=1000, weights=[0.1, 0.9])
# your GNB estimator
gnb = GaussianNB()
gnb.fit(X, y)

gnb.class_prior_
Out[168]: array([ 0.105,  0.895])

gnb.get_params()
Out[169]: {}

You see the estimator is smart enough to take into account the unbalanced weight issue. So you don't have to manually specify the priors.

How to specify the prior probability for scikit-learn's Naive Bayes

2 Answers