0
votes

I'm trying to classify text using naive bayes classifier, and also want to use k-fold cross validation to validate the result of classification. But I'm still confused how to use the k-fold cross validation. As i know that k-fold divide data to k subsets, then one of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. And i think as training set the data must have label to be trained. So to use k-fold cross validation the required data is the labeled data?, is it right?, and how about non labeled data?.

1
Typically, for any supervised learning, the data needs to be labeled. And then again, for evaluation, the data needs to be labeled.Chthonic Project
so actually the k-fold cross validation used in naive bayes for training, and not for testing?Muhammad Haryadi Futra Iskanda
Well you will train the Naive Bayes with with k-1 subsets. When the model is created you will evaluate the model with the remaining subset. The model will predict a class and you can compare this predicted result with the corret result.user

1 Answers

0
votes

for non labeled data you must use clustering methods, for nb maybe this code would help you:

[testF, trainF] = kfolds(Features,k);
[testL, trainL] = kfolds(Label,k);
c = size(Features);
for i=1:k
    LabelTrain = trainL{i};
    LabelTest = testL{i};
    FeaturesTrain = trainF{i};
    FeaturesTest = testF{i};
    nb = NaiveBayes.fit(FeaturesTrain,LabelTrain);
    Class = predict(nb,FeaturesTest);
    predict_Class(i)=sum(Class==LabelTest);
end
predict_all = sum(predict_Class)/c(1);

kfolds function would separate your data to k folds.

cheers