0
votes

So I want to use my the data that I defined below (has two labels) and use KNN for training and testing and also cross-validation. I could not find useful MATLAB tutorials so I appreciate it if you guys can help me.

Imagine I have

Data=rand(2000,2);
Lables=[ones(1000,1);-1*ones(1000,1)]; 

I want to use KNN and have:

  • 50% of the data for training
  • 25% cross-validation
  • 25% testing
1

1 Answers

0
votes

The data you gave is not such a good data-set since there is no variance between the 2 sets. You should use

Data = [rand(1000,2)+delta;rand(1000,2)-delta];

The largest delta the easier it would be to classify The idea behind kNN is that you don't need any training.

Suppose you have a dataset with N labeled values. Now suppose you have an entry which you wish to classify.

If you consider the 1-NN classifier, you calculate the distance between the input and the N labeled training example. The input classified to have the label of the example with the shortest distance.

In the k-NN classifier, you check what are the k labels of the examples with the shortest distance. The class with the largest number of NN wins.

In MATLAB you can use either knnserach to find the nearest k indices, or just use knnclassify to get the label.

here is an example for knnserach

delta = 0.3;
N1 = 50;
N2 = 50;
Data1 = rand(1000,2)+delta;
Data2 = rand(1000,2)-delta;
train = [Data1(1:N1,:);Data2(1:N2,:)]; % create a training set
labels = [ones(N1,1);-1*ones(N2,1)]; % create labels for the training
plot(train(1:N1,1),train(1:N1,2),'xb',train(N1+1:end,1),train(N1+1:end,2),'or')
k = 7; % Can't be an even number
idx = knnsearch(train,Data1(N1+1:end,:),'K',k); % classify for the rest of data 1
res1 = 0;
for i=1:size(idx,1)
    if sum(labels(idx(i,:))) < 0;
        res1 = res1 + 0; % wrong answer
    else
        res1 = res1 + 1; % correct answer
    end
end
idx2 = knnsearch(train,Data2(N2+1:end,:),'K',k); % classify for the rest of data 2
res2 = 0;
for i=1:size(idx2,1)
    if sum(labels(idx2(i,:))) > 0;
        res2 = res2 + 0; % wrong answer
    else
        res2 = res2 + 1; % correct answer
    end
end
corr = res1+res2;
tot = size(idx2,1)+size(idx,1);
fprintf('Classified %d right out of %d. %.2f correct\n',corr,tot,corr / tot * 100)