0
votes

I'm trying to implement a NB classifier in R by recreating the results of data given to me. Right now I'm simply testing on the training data itself, to see what the accuracy is like.

There are 29 variables in the dataset, one of which is called "Status". It has two values, Win and Lose. I've split the training data into roughly 2/3 training, 1/3 testing. The goal is to determine the accuracy of the prediction of Status being a win or lose.

I think I understand the error, in so much as "Win" and "Lose" aren't numeric values, but as I understand it, would they not be factors? I'll post my code below. I'm using the bnlearn examples from http://www.bnlearn.com/documentation/man/naive.bayes.html as my bases for this. If there are better examples out there, please let me know.

#Read in training data
trainingdata <- read.csv("C:\\.....filepath.csv", header=T)

#Split data into training and test sets
training.set = trainingdata[1:1200, ]
test.set = trainingdata[1201:1860, ]

#Train model
bn = naive.bayes(training.set, "Status") 
fitted = bn.fit(bn, training.set)

#Predict
pred = predict(fitted, test.set)
table(pred, test.set[, "Status"])

I start to get the error from the bn = naive.bayes(training.set, "Status") line. The specific error says "Error in data.type(x) : variables must be either numeric, factors or ordered factors

Is there a way I can get bnlearn to recognise that "Status" is a factor.

1
Try looking at str(trainingdata) to see if your variables are of the appropriate type.Cotton.Rockwood
My guess is that "Status" is a character vector.tchakravarty
@Cotton.Rockwood If I do that, it says that all of the variables are Factors with various levels, except for two of them, which are ints. Would it be the two int ones that are causing the problem?Eoin
You should provide a reproducible example with sample input that we could run to see what's going on.MrFlick
I would guess so. In the documentation for naive.bayes it says all variables in the data frame must be factors. You can just add trainingdata <- lapply(trainingdata, as.factor). This assumes that all the variables can be reasonably converted to factors, which should be the case anyway if you plan on using them with naive.bayes.Cotton.Rockwood

1 Answers

1
votes

Coming back to this years later as I realised I never updated it with the answer, and no one has posted a solution since then for me to accept.

Cotton.Rockwood's assumption was correct. With bnlearn's naive bayes classifier, all the variables need to be factors, otherwise you will encounter this error.