I'm trying to classify text using naive bayes classifier, and also want to use k-fold cross validation to validate the result of classification. But I'm still confused how to use the k-fold cross validation. As i know that k-fold divide data to k subsets, then one of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. And i think as training set the data must have label to be trained. So to use k-fold cross validation the required data is the labeled data?, is it right?, and how about non labeled data?.
0
votes
Typically, for any supervised learning, the data needs to be labeled. And then again, for evaluation, the data needs to be labeled.
– Chthonic Project
so actually the k-fold cross validation used in naive bayes for training, and not for testing?
– Muhammad Haryadi Futra Iskanda
Well you will train the Naive Bayes with with k-1 subsets. When the model is created you will evaluate the model with the remaining subset. The model will predict a class and you can compare this predicted result with the corret result.
– user
1 Answers
0
votes
for non labeled data you must use clustering methods, for nb maybe this code would help you:
[testF, trainF] = kfolds(Features,k);
[testL, trainL] = kfolds(Label,k);
c = size(Features);
for i=1:k
LabelTrain = trainL{i};
LabelTest = testL{i};
FeaturesTrain = trainF{i};
FeaturesTest = testF{i};
nb = NaiveBayes.fit(FeaturesTrain,LabelTrain);
Class = predict(nb,FeaturesTest);
predict_Class(i)=sum(Class==LabelTest);
end
predict_all = sum(predict_Class)/c(1);
kfolds function would separate your data to k folds.
cheers