Hi:I'm new to the plyr/dplyr family but enjoying it. I can see it's massive utility for my own work, but I'm stil trying to get my head around it.
I have a data frame that looks like below.
1) How do I produce a table for each non-grouping variable that shows the distribution of responses within each value of the grouping variable?
2) Note: I do have some missing values and I would like to exclude them from the tabulation. I realize the summarize_each command will apply the function to each column, but I don't know how to handle the missing values issue in a simple way. I have seen some codes that suggest you have to filter out missing values, but what if the missing values are scattered randomly through the non-grouping variables?
3) Fundamentally, is it best to just use complete cases with dplyr?
#library
library(dplyr)
#sample data
group<-sample(c('A', 'B', 'C'), 100, replace=TRUE)
var1<-sample(c(1,2,3,4,5,NA), 100, replace=TRUE, prob=c(0.15,0.15,0.15,0.15,0.15,0.25))
var2<-sample(c(1,2,3,4,5,NA), 100, replace=TRUE, prob=c(0.15,0.15,0.15,0.15,0.15,0.25))
var3<-sample(c(1,2,3,4,5,NA), 100, replace=TRUE, prob=c(0.15,0.15,0.15,0.15,0.15,0.25))
df<-data.frame(group, var1, var2, var3)
#my code
out_df<-df %>%group_by(group)
out_df %>% summarise_each(funs(table))