0
votes

Doing a cluster analysis, have a dataframe with bunch of different attribute columns and a column for a cluster label. Need to summarize this data.

Doing groupby sum operation, which I want to do for multiple columns in my dataframe, and need to store the resulting tibble in a list and finally put them together as a dataframe.

datalist = list()

for (i in 5:15){

     dat = df %>%
           group_by(cluster) %>% 
           summarise((colnames(df)[i]) = (sum(colnames(df)[i])))
     dat$i = i
     datalist[[i]] = dat
}

combined = do.call(cbind, datalist)

a dataframe with cluster label on the rows, and sum of attr1 ~ N for each column

1

1 Answers

1
votes

You might be looking for summarise_all function in dplyr which summarizes all columns in dataframe using some given function -

df %>% 
  group_by(cluster) %>% 
  summarise_all(~sum(., na.rm = T))

For summarizing only subset of columns, look at ?summarise_at()