0
votes

I am trying to prepare data for cluster analysis. That's why I have prepared data tables in excel and the headers are "id","name","crime_type","crime_date","gender","age" Then , I convert the excel into .csv format. Then , I write the following command ->

>crime <-  read.csv("crime_data.csv",header=T)
>crime # I print , and it prints

# now I will do cluster with kmeans()

>kmeans.result <- kmeans(crime,3)

But , it shows errors. "Error is as follows : Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1) In addition: Warning message: In kmeans(crime, 3) : NAs introduced by coercion"

What I am doing wrong here...

2
not reproducible ... please at least add the results of str(crime) to your question ... ??? - Ben Bolker

2 Answers

0
votes

I can't speak to your specific problem without knowing what you data looks like but it could be as simple as giving the xlsx package a try. I think it handles NaNs better

install.packages(xlsx)
library(xlsx)
yourdata <- read.xlsx("YOURDATASHEET.xlsx", sheetName="THESHEETNAME")
0
votes

Seems like you are asking two questions. For the first; you can also try reading directly from the clipboard (beware of large tables tough, but so far I have good results with 40k rows, 30 col)

d1<-read.table(file="clipboard",sep="\t",header=FALSE,stringsAsFactors=FALSE)

set header to TRUE if you want to name your columns. You can also use what was suggested above to open excel sheets directly but this might not be practical if you have non standard tables.

For the second part perhaps you should convert to numerical using the sapply function and or suppressWarnings().