I'm a new Python user and have been running a Naive Bayes classifier model using the scikit-learn module. Is the following example code on the scikit learn Naïve Bayes documentation page correct?
from sklearn import datasets
iris = datasets.load_iris()
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
y_pred = gnb.fit(iris.data, iris.target).predict(iris.data)
print("Number of mislabeled points out of a total %d points : %d"
Shouldn't the gnb.fit()
function instead read:
y_pred = gnb.fit(iris.data.drop(columns=['target']), iris.target).predict(iris.data)
That is, the response variable needs to be manually removed from the predictor dataset. I was getting unreasonably high accuracy metrics for my model when a colleague pointed out that the code I had cribbed from the scikit-learn documentation page is wrong.