0
votes
set.seed(400)
random <- createDataPartition(mldata.knn$Transport, p=0.70,list = F)
mldata_train <- mldata.knn[random,]
mldata_test <- mldata.knn[-random,]
print(table(mldata.knn$Transport))
print(table(mldata_train$Transport))
library(e1071)
NB_model = naiveBayes(mldata.knn$Gender ~., data = mldata_train)
print(NB_model)

Error in model.frame.default(formula = mldata.knn$Gender ~ ., data = mldata_train, : variable lengths differ (found for 'Age')

1

1 Answers

0
votes

The lengths of the data are different when making the Naive Bayes classifier for the data. Whatever the length of your mldata.knn is, your training data set mldata_train will contain 70% of the data based on your 70/30 split from createDataPartition.

So, in the statement:

NB_model = naiveBayes(mldata.knn$Gender ~., data = mldata_train)

You have mldata.knn$Gender which has the length of the original data set (mldata.knn), and also whatever variables were in there from dot (.) in the formula - but these variables are taken from mldata_train as the statement includes data = mldata_train. These other variables would have a different length (only 70% of original data).

Perhaps you intend to just use the training data to make your NB classifier:

NB_model = naiveBayes(Gender ~., data = mldata_train)