7
votes
library(tidyverse) 

I'm stuck on something that should be so simple! Using the code below, all I want to do is group and summarise the three "Var" columns. I want counts and sums (so that I can create three percentage columns, so bonus if you can include an easy way to accomplish this in your answer). However, I don't want to include the NA's. Removing the NA's from sum is easy enough by using "na.rm=TRUE", but I can't seem to figure out how to not include the NA's in the counts (using n() ) while using dplyr::summarise_at.

Am I missing something very simple?

Df%>%group_by(Group)%>%summarise_at(vars(Var1:Var3),funs(n(),sum((.),na.rm=TRUE)))

Group<-c("House","Condo","House","House","House","House","House","Condo")
Var1<-c(0,1,1,NA,1,1,1,0)    
Var2<-c(1,1,1,1,0,1,1,1)
Var3<-c(1,1,1,NA,NA,1,1,0)

Df<-data.frame(Group,Var1,Var2,Var3)
2
Perhaps try summarise_all(funs(cnt = sum(!is.na(.)), sum = sum(.,na.rm=TRUE)))markdly
@markdly you shld make that an answer.hrbrmstr
Thanks @hrbrmstr, have posted as answermarkdly

2 Answers

10
votes

I think your code was very close to getting the job done. I made some slight changes and have included an example of how you might include the percent calculation in the same step (although I am not sure of your expected output).

library(dplyr)
Df %>% 
  group_by(Group) %>% 
  summarise_all(funs(count = sum(!is.na(.)), 
                     sum = sum(.,na.rm=TRUE),
                     pct = sum(.,na.rm=TRUE)/sum(!is.na(.))))

#> # A tibble: 2 x 10
#>    Group Var1_count Var2_count Var3_count Var1_sum Var2_sum Var3_sum
#>   <fctr>      <int>      <int>      <int>    <dbl>    <dbl>    <dbl>
#> 1  Condo          2          2          2        1        2        1
#> 2  House          5          6          4        4        5        4
#> # ... with 3 more variables: Var1_pct <dbl>, Var2_pct <dbl>,
#> #   Var3_pct <dbl>

I've also used summarise_all instead of summarise_at as summarise_all works on all the variables which aren't group variables.

0
votes

I think you just need to move your 'na.rm()' argument back in the parentheses. See below:

Group<-c("House","Condo","House","House","House","House","House","Condo")
Var1<-c(0,1,1,NA,1,1,1,0)    
Var2<-c(1,1,1,1,0,1,1,1)
Var3<-c(1,1,1,NA,NA,1,1,0)

Df<-data.frame(Group,Var1,Var2,Var3)

out <- Df %>%
  group_by(Group) %>% 
  mutate_at(vars(Var1:Var3), funs(total = sum(!(is.na(.))), sum = sum(., na.rm = T))) %>% 
  ungroup()