1
votes

I have a training dataset (50000 X 16) and test dataset (5000 X 16)[the 16th column in both the datasets are decision labels or response. The decision label in test dataset in used for checking the classification accuracy of the trained classifier]. I am using my training data for training and validating my cross validated knn classifier. I have created a cross validated knn classifier model using the following code :

X = Dataset2(1:50000,:); % Use some data for fitting
Y = Training_Label(1:50000,:); % Response of training data

%Create a KNN Classifier model
rng(10); % For reproducibility
Mdl = fitcknn(X,Y,'Distance', 'Cosine', 'Exponent', '', 'NumNeighbors', 10,'DistanceWeight', 'Equal', 'StandardizeData', 1);

%Construct a cross-validated classifier from the model.
CVMdl = crossval(Mdl,'KFold', 10);

%Examine the cross-validation loss, which is the average loss of each cross-validation model when predicting on data that is not used for training.
kloss = kfoldLoss(CVMdl, 'LossFun', 'ClassifError')

% Compute validation accuracy
validationAccuracy = 1 - kloss;

now I want to classify my Test data using this cross validated knn classifier but can't really figure out how to do that. I have gone through the available examples in matlab but couldn't find any suitable function or examples for doing this.

I know I can use the "predict" function for predicting the classlabels of my test data if my classifier is not cross validated. The code is as following :

X = Dataset2(1:50000,:); % Use some data for fitting
Y = Training_Label(1:50000,:); % Response of training data

%Create a KNN Classifier model
rng(10); % For reproducibility
Mdl = fitcknn(X,Y,'Distance', 'Cosine', 'Exponent', '', 'NumNeighbors', 10,'DistanceWeight', 'Equal', 'StandardizeData', 1);

%Classification using Test Data
Classifier_Output_Labels = predict(Mdl,TestDataset2(1:5000,:));

But I could not find any similar function (like "predict") for cross validated trained knn classifier. I found out the "kfoldPredict" function in Matlab documentation, but it says the function is used to evaluate the trained model. http://www.mathworks.com/help/stats/classificationpartitionedmodel.kfoldpredict.html But I did not find any input of the new data through this function.

So could anyone please advise me how to use the cross validated knn classifier model to predict labels of new data? Any help is appreciated and badly needed. :( :(

2

2 Answers

2
votes

Let's say you are doing 10-fold cross validation while learning the model. You can then use the kfoldLoss function to also get the CV loss for each fold and then choose the trained model that gives you the least CV loss in the following way:

modelLosses = kfoldLoss(Mdl,'mode','individual');

The above code will give you a vector of length 10 (10 CV error values) if you have done 10-fold cross-validation while learning. Assuming the trained model with least CV error is the 'k'th one, you would then use:

testSetPredictions = predict(Mdl.Trained{k}, testSetFeatures);
1
votes

You seem to be confusing things here. Cross validation is a tool for model selection and evaluation. It is not training procedure per se. Consequently you cannot "use" cross validated object. You predict using trained object. Cross validation is a form of estimating generalization capabilities of a given model, it has nothing to do with actual training, it is rather a small statistical experiment to asses a particular property.