0
votes

I'm trying to run the code using the Biopsy dataset from R. The idea is creating a Knn machine learning.

I appreciate your help.

I tried to run the code but I got some warnings. e.g. NAs introduced by coercion.

cc = c(1:100)*0
for (i in 1:100) {

L<- sample(1:nrow(biopsy_sem_NA_nas_Linhas),round(nrow(biopsy_sem_NA_nas_Linhas)/3))

train_sem_NA_Linhas = biopsy_sem_NA_nas_Linhas[-L,2:11]
test_sem_NA_Linhas = biopsy_sem_NA_nas_Linhas[L,2:11]

cl = factor( biopsy_sem_NA_nas_Linhas[-L, 11])
fit = knn(train_sem_NA_Linhas, test_sem_NA_Linhas, cl, k = 5)

c_matrix = table(fit[1:length(L)], factor(biopsy_sem_NA_nas_Linhas[L, 11]))
acc[i] = cat('Accurancy:', sum(diag(c_matrix))/sum(c_matrix)*100, '%')
}
mean(acc)

Errors below appeared:

Error in knn(train_sem_NA_Linhas, test_sem_NA_Linhas, cl, k = 5) : NA/NaN/Inf in foreign function call (arg 6) In addition: Warning messages: 1: In knn(train_sem_NA_Linhas, test_sem_NA_Linhas, cl, k = 5) : NAs introduced by coercion 2: In knn(train_sem_NA_Linhas, test_sem_NA_Linhas, cl, k = 5) : NAs introduced by coercion

1

1 Answers

0
votes

The train and test data frames should not have the factor column, the last one. The factor is only supplied in cl. You can check this in the documentation of the knn function.

I removed the cat part because you can't take the mean of a string. But if you really want a list with messages you should try using paste instead.

library(MASS)
library(tidyverse)
library(class)

data("biopsy")

biopsy_sem_NA_nas_Linhas <-  biopsy %>% 
  na.omit() #creatind the biopsy data "sem NA"

acc = c(1:100)*0
for (i in 1:100) {

  L<- sample(1:nrow(biopsy_sem_NA_nas_Linhas),round(nrow(biopsy_sem_NA_nas_Linhas)/3))
  #without the last column, the one with the factors
  train_sem_NA_Linhas = biopsy_sem_NA_nas_Linhas[-L,2:10] 
  test_sem_NA_Linhas = biopsy_sem_NA_nas_Linhas[L,2:10]

  cl = factor( biopsy_sem_NA_nas_Linhas[-L, 11])
  fit = knn(train_sem_NA_Linhas, test_sem_NA_Linhas, cl = cl, k = 5)

  c_matrix = table(fit[1:length(L)], factor(biopsy_sem_NA_nas_Linhas[L, 11]))
  acc[i] = sum(diag(c_matrix))/sum(c_matrix)*100
}
mean(acc)