I am trying to predict the Species (3 classes) from the iris dataset:
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
I've created the numerical vectors tr and nw, which I use to subset iris so I can get training data and new data:
>knn5 <- knn(iris[tr, -5], iris[nw, -5], iris$Species[nw], k = 5, prob = TRUE)
>knn5
[1] versicolor virginica virginica versicolor virginica setosa setosa setosa setosa setosa setosa setosa setosa versicolor virginica
[16] setosa setosa setosa virginica setosa setosa virginica versicolor virginica virginica versicolor setosa versicolor versicolor setosa
[31] versicolor setosa virginica setosa versicolor versicolor versicolor setosa versicolor versicolor virginica virginica virginica setosa versicolor
[46] setosa versicolor versicolor setosa versicolor
attr(,"prob")
[1] 0.4000000 0.4000000 0.4000000 0.6000000 0.4000000 0.6000000 0.6000000 0.4000000 0.3333333 0.6000000 0.6000000 0.5000000 0.6000000 0.6000000 0.6000000 0.5000000
[17] 0.4000000 0.6000000 0.4000000 0.6000000 0.6000000 0.6000000 0.6000000 0.6000000 0.6000000 0.8000000 0.4000000 0.6000000 0.6000000 0.6000000 0.4000000 0.6000000
[33] 0.4000000 0.6000000 0.8000000 0.6000000 0.6000000 0.6000000 0.6000000 0.6000000 0.6000000 0.6000000 0.6000000 0.5000000 0.6000000 0.3333333 0.4000000 0.6000000
[49] 0.6000000 0.6000000
Levels: setosa versicolor virginica
I understand that the predictions are very bad because in the knn I put the wrong vector for the labels; my question is not about that.
My question is, why am I getting 0.3333333 as values for prob? Since we are looking at 5 neighbors, I would expect that we only get values of the form n/5.
My initial guess was that these are places where there was a tie; however, I then realized that values of 0.4000000 are places where there must be ties (since we only have 3 classes, so the others must've voted 0.4 and 0.2). So I'm not sure about my guess anymore.