3
votes

I have a training data set with 28 variables (13 labels and 15 features). A test data set with 15 features and I have to predict labels for this test data set based on the features. I made KNN classifiers for all 13 labels individually.

Is there a possibility of combining all these 13 individual label KNN classifiers into one single multi label classifier?

My current code for single label:

library(class)
train_from_train <- train[1:600,2:16] 
target_a_train_from_train <- train[1:600,17] 
test_from_train <- train[601:800,2:16]
target_a_test_from_train <- train[601:800,17] 
knn_pred_a <-knn (train = train_from_train, test = test_from_train, cl= target_a_train_from_train, k = 29) 
table(knn_pred_a, target_a_test_from_train)
mean(knn_pred_a != target_a_test_from_train) 
knn_pred_a_ON_TEST <-knn (train = train[,2:16], test = test[2:16], cl= train[,17], k = 29) 
knn_pred_a_ON_TEST

I scoured internet and package mldr seems to be an option but I couldn't adapt it to my needs.

1
Can you add the code for your KNN? Indeed the selection of the nearest Neighbours can be combined,purely theoretically speaking. However, stackoverflow.com/questions/5963269/… - CAFEBABE
@CAFEBABE Apologise for the formatting. And so forth. "a" is the label for first my first KNN classifer. Its presence of bacteria species(0/1) - Abhijeet

1 Answers

2
votes

You can use the package ARNN for this. However, it is not exact as far as I know.

library(RANN)
library(reshape2)

####
## generate some sample data and randomize order
iris.knn <- iris[sample(1:150,150),]
#add a second class
iris.knn["Class2"] <- iris.knn[,5]=="versicolor"
iris.knn$org.row.id <- 1:nrow(iris.knn)
train <- iris.knn[1:100,]
test <- iris.knn[101:150,]
##
#####
## get nearest neighbours
nn.idx <- as.data.frame(nn2(train[1:4],query=test[1:4],k=4)$nn.idx)
## add row id
nn.idx$test.row.id <- test$rowid

#classes and row id
multiclass.vec <- data.frame(row.id=1:150,iris.knn[,5:6])
#1 row per nearest neighbour
melted <-melt(nn.idx,id.vars="row.id")
merged <- merge(melted,multiclass.vec, by.x = "value",by.y="org.row.id")
#aggrgate a single class
aggregate(merged$Species, list(merged$row.id), function(x) names(which.max(table(x))))

 #### aggregate for all classes
 all.classes <- melt(merged[c(2,4,5)],id.vars = "row.id")
 fun.agg <- function(x) {
               if(length(x)==0){
                 ""  #<-- default value adaptation might be needed.
               }else{
                 names(which.max(table(x)))
               }
 }
 dcast(all.classes,row.id~variable, fun.aggregate=fun.agg,fill=NULL)

I did the aggreation only for a single class. Doing this step for all classes in parallel would require another melt operation and would make the code pretty messy.