This is my first time using Scikit, and apologies if the question is stupid. I'm trying to implement a naive bayes classifier on UCI's mushroom dataset to test the results against my own NB classifier coded from scratch.
The dataset is categorical and each feature has more than 2 possible attributes so I used a multinomial NB instead of a Gaussian or Bernouilli NB.
However, I keep getting the following error ValueError: could not convert string to float: 'l' , and am not sure what to do. Shouldn't a multinomial NB be able to take string data?
Example line of data - 0th column is the class (p for poisonous and e for edible) and the remaining 22 columns are the features.
p,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u
# based off UCI's mushroom dataset http://archive.ics.uci.edu/ml/datasets/Mushroom
df = pd.DataFrame(data)
msk = np.random.rand(df.shape[0]) <= training_percent
train = data[msk]
test = data[~msk]
clf = MultinomialNB()
clf.fit(train.iloc[:, 1:], train.iloc[:, 0])