I'm trying to implement a NB classifier in R by recreating the results of data given to me. Right now I'm simply testing on the training data itself, to see what the accuracy is like.
There are 29 variables in the dataset, one of which is called "Status". It has two values, Win and Lose. I've split the training data into roughly 2/3 training, 1/3 testing. The goal is to determine the accuracy of the prediction of Status being a win or lose.
I think I understand the error, in so much as "Win" and "Lose" aren't numeric values, but as I understand it, would they not be factors? I'll post my code below. I'm using the bnlearn examples from http://www.bnlearn.com/documentation/man/naive.bayes.html as my bases for this. If there are better examples out there, please let me know.
#Read in training data
trainingdata <- read.csv("C:\\.....filepath.csv", header=T)
#Split data into training and test sets
training.set = trainingdata[1:1200, ]
test.set = trainingdata[1201:1860, ]
#Train model
bn = naive.bayes(training.set, "Status")
fitted = bn.fit(bn, training.set)
#Predict
pred = predict(fitted, test.set)
table(pred, test.set[, "Status"])
I start to get the error from the bn = naive.bayes(training.set, "Status")
line. The specific error says "Error in data.type(x) : variables must be either numeric, factors or ordered factors
Is there a way I can get bnlearn to recognise that "Status" is a factor.
str(trainingdata)
to see if your variables are of the appropriate type. – Cotton.Rockwoodnaive.bayes
it says all variables in the data frame must be factors. You can just addtrainingdata <- lapply(trainingdata, as.factor)
. This assumes that all the variables can be reasonably converted to factors, which should be the case anyway if you plan on using them withnaive.bayes
. – Cotton.Rockwood