0
votes

I'm stuck on plotting my data with ggplot. The outpit I have now is 16 obs. of 3 variables. I used unlist to make it a different data type but I still get errors to plot. My code:

library(datasets)
data(iris)
cluster_data<-iris[-5]

calcss <- function(missingvar,kval) {
    cluster<-kmeans(cluster_data[-missingvar],kval,nstart=100)
    TotWithinSS<-cluster$tot.withinss
    return(TotWithinSS)
}
kvals=list()
sumsqs=list()
missvars=list()
for(k in 2:5){
    for(var in 1:4){
       kvals=rbind(kvals,k)
       sumsqs=rbind(sumsqs,calcss(var,k))
       missvars=rbind(missvars,var)
    }
}
out<-data.frame(kvals,missvars,sumsqs)
ggplot(data=out,aes(missvars,sumsqs,color=kvals))

The error says:

Don't know how to automatically pick scale for object of type list. Defaulting to continuous. Don't know how to automatically pick scale for object of type list. Defaulting to continuous. Don't know how to automatically pick scale for object of type list. Defaulting to continuous.

1
Two things. Initialise kvals, sumsqs, missvars as numeric(0) not as list(). And add a geom to your plot, such as +geom_point()Andrew Gustar
Thank you! Works great now!Deborah Paul

1 Answers

1
votes

You can't use list with ggplot2. It's also not recommended to grow your objects within for loop. It can get very slow if you have bigger data. See how to do it more efficiently here and here

So you should pre-allocate the size of your objects before for loop with either rep or vector

library(datasets)
library(ggplot2)

data(iris)
cluster_data <- iris[-5]

calcss <- function(missingvar, kval) {
  cluster <- kmeans(cluster_data[-missingvar], kval, nstart = 100)
  TotWithinSS <- cluster$tot.withinss
  return(TotWithinSS)
}

kvals = rep(NA, 16) # or use kvals = vector("numeric", 16L)
sumsqs = rep(NA, 16)
missvars = rep(NA, 16)

for(k in 2:5) {
  for(var in 1:4) {
    kvals = rbind(kvals, k)
    sumsqs = rbind(sumsqs, calcss(var, k))
    missvars = rbind(missvars, var)
  }
}

out <- data.frame(kvals, missvars, sumsqs)

ggplot(data = out, aes(missvars, sumsqs, color = kvals)) +
  geom_point()

Created on 2018-05-31 by the reprex package (v0.2.0).