1
votes

I would like to be able to use a condition to filter what gets summarised in summarise_at calls. I know this can be done when calling dplyr:: summarise() directly but I would like to avoid having to specify this for each variable.

I have provided the below code thinking that this call to summarise_at would give me the correct data. But it does not. I would like the code to return the data set labelled final data. Where there is a row for each individual with each variable summed ONLY when the row matches with the filtering condition. When it does not match or there is nothing to sum I would like it to return 0 or NA.

Data = tibble(Group = c(1,1,1,1,1,1,1,1),
              Week = c(1,2,3,4,1,2,3,4),
              Condition = c(1,1,1,0,0,1,1,0),
              Var1 = 1:8,
              Var2 = 9:16,
              Var3 = 17:24)

Data %>% 
    group_by(Group, Week, Condition) %>% 
    summarise_at(vars(Var1, Var2, Var3), ~sum(., na.rm = T)) %>%
    ungroup()


Final.Data = tibble(Group = c(1,1,1,1),
                    Week = c(1,2,3,4),
                    Var1 = c(1,8,10,0),
                    Var2 = c(9,24,26,0),
                    Var3 = c(17,40,42,0))
2

2 Answers

1
votes

You can sum where Condition == 1

library(dplyr)

Data %>% 
  group_by(Group, Week) %>% 
  summarise_at(vars(Var1, Var2, Var3), ~sum(.[Condition == 1], na.rm = TRUE)) %>%
  ungroup()

# A tibble: 4 x 5
#  Group  Week  Var1  Var2  Var3
#  <dbl> <dbl> <int> <int> <int>
#1     1     1     1     9    17
#2     1     2     8    24    40
#3     1     3    10    26    42
#4     1     4     0     0     0

Or another option is to make Week as factor, filter where Condition == 1 and take sum

Data %>% 
  mutate(Week = factor(Week, levels = unique(Week))) %>%
  filter(Condition == 1) %>%
  group_by(Group, Week, .drop = FALSE) %>% 
  summarise_at(vars(Var1, Var2, Var3), sum, na.rm = TRUE)
1
votes

An option is to do a group_by on 'Group', 'Week', then use summarise_at on the columns that starts_with ("Var"), subset each of the column based on the binary values in 'Condition' which can be converted to logical with as.logical so that 1-> TRUE and 0 -> FALSE, and get the sum of each column

library(dplyr)
Data %>% 
    group_by(Group, Week) %>% 
    summarise_at(vars(starts_with("Var")), ~ 
            sum(.[as.logical(Condition)], na.rm = TRUE)) %>%
    ungroup
# A tibble: 4 x 5
#  Group  Week  Var1  Var2  Var3
#  <dbl> <dbl> <int> <int> <int>
#1     1     1     1     9    17
#2     1     2     8    24    40
#3     1     3    10    26    42
#4     1     4     0     0     0

If there are other values as well, then use ==

Data %>% 
    group_by(Group, Week) %>% 
    summarise_at(vars(starts_with("Var")),
      ~ sum(.[Condition == 1], na.rm = TRUE)) %>%
    ungroup

Or we can multiply by 'Condition' as any value muliplied by 0 gives 0 and 1 gives the number itself, and then do the sum

Data %>% 
   group_by(Group, Week) %>% 
   summarise_at(vars(matches("Var")), ~ sum(. * Condition))

Or using data.table

library(data.table)
setDT(Data)[, lapply(.SD, function(x) sum(x[Condition == 1], na.rm = TRUE)),
       by =  .(Group, Week), .SDcols = grep("Var", names(Data))]
#   Group Week Var1 Var2 Var3
#1:     1    1    1    9   17
#2:     1    2    8   24   40
#3:     1    3   10   26   42
#4:     1    4    0    0    0

Or using an option in base R with rowsum

rowsum(Data[4:6] * Data$Condition, group = do.call(paste, Data[1:2]))

Or using by

do.call(rbind, by(Data[-(1:2)], Data[1:2], FUN = function(x) colSums(x[-1] * x[,1])))