I'm using Gaussian Naive Bayes to train a model from a Pandas data frame, but I'm getting an error when using precision_recall_curve. The documentation says precision_recall_curve takes the predicted probabilities as input (at least as I read it) so I would expect the below to work (xtrain and xtest are Pandas data frames with 736 and 184 rows respectively; ytrain/ytest are Series with 736 and 184 rows respectively):
nb = GaussianNB()
nb.fit(xtrain, ytrain)
predicted = nb.predict_proba(xtest)
precision, recall, threshold = precision_recall_curve(ytest, predicted)
I expect the above to work, however I receive an "IndexError: index 230 is out of bounds for size 184". If I instead do:
predicted = nb.predict(xtest)
precision, recall, threshold = precision_recall_curve(ytest, predicted)
Then it executes properly. 184 is the number of rows in xtest and ytest, but 230 is not a dimension for any of those structures. Can someone explain the difference or how I'm supposed to be using precision_recall_curve for this purpose?