SciKit-learn - Training a Gaussian Naive Bayes Classifier

Question

I am trying to plot the decision surface for a Gaussian Naive Bayes classifier. I seem to be having a bit of a problem with training the classifier though. I am also very new to machine learning.

First I generate 100 random points, with half having a different coordinate and label.

for i in range(50):
    point1.append([np.random.randint(50,80),np.random.randint(50,80)])
    point1L.append(1)
for i in range(50):
    point2.append([np.random.randint(10,40),np.random.randint(10,70)])
    point2L.append(0)

I then train it.

clf = GaussianNB()
clf.fit(point1,point1L)
clf.fit(point2, point2L)

I then run into a problem. The classifier I have here doesn't seem to be able to differentiate between the two points.

print(clf.predict([np.random.randint(50,80),np.random.randint(50,80)]))
print(clf.predict([np.random.randint(10,40),np.random.randint(10,70)]))

The result I get for this always seem to be:

[0]
[0]

What am I doing wrong, and how do I fix it?

On a side note, I would also like to know if I can plot the decision boundary straight from the classifier itself, and not by comparing decisions by the classifier at every point.

just switch "fit(x,y)" to "partial_fit(x,y,[0,1])" and it will work — lejlot

Vivek Kumar Vivek Kumar · Accepted Answer · 2017-02-04T16:35:35

fit() method should only be called once. You are calling fit method two times, one for point1 and one for point2. So when you call fit() again for point2, the estimator resets itself and only train for point2L, which is 0. That's why your predictions are always 0. First combine the point1 and point2 into a new matrix, (same for labels) and then call the fit method on new matrix.

SciKit-learn - Training a Gaussian Naive Bayes Classifier

1 Answers