2
votes

In the following example, how would I select a value (from mpg) per group (cyl) depending on a condition in another a column (carb == 1). Note that I also want to summarize another variable (averaging qsec per group). My best guess below gets an error:

library(dplyr)
mtcars %>% 
    distinct(cyl, carb, .keep_all = TRUE) %>% 
    group_by(cyl) %>% 
    summarize(
        mpg = mpg[.$carb == 1],
        qsec = mean(qsec)
    )
1

1 Answers

2
votes

If there are more than one rows having 'carb' as 1 and summarise returns only a single row per group or without any group, it is better to wrap the output in a list. If we use $, it would break the grouping

library(tidyverse)
out <- mtcars %>% 
        distinct(cyl, carb, .keep_all = TRUE) %>% 
        group_by(cyl) %>% 
        summarize(
          mpg = list(mpg[carb == 1]),
          qsec = mean(qsec)
        ) 

out
# A tibble: 3 x 3
#    cyl mpg        qsec
#  <dbl> <list>    <dbl>
#1     4 <dbl [1]>  19.3
#2     6 <dbl [1]>  17.1
#3     8 <dbl [0]>  16.2

By looking at the output, for the 'cyl' 8, there are no 'carb' which is equal to 1. and that results in numeric(0)

By wrapping with replace_na, elements that are of length 0 can be changed to NA and then do unnest. Otherwise, as @Dave Gruenewald mentioned in the comments, that row could be removed automatically while unnesting

out %>% 
  mutate(mpg = replace_na(mpg)) %>% 
  unnest
# A tibble: 3 x 3
#    cyl  qsec   mpg
#  <dbl> <dbl> <dbl>
#1     4  19.3  22.8
#2     6  17.1  21.4
#3     8  16.2  NA  

Another option, if we already know that there would be at most 1 element of 'carb' that is equal to 1, then use an if/else condition in summarise

mtcars %>%
    distinct(cyl, carb, .keep_all = TRUE) %>% 
    group_by(cyl) %>%
    summarise(
       mpg = if(any(carb == 1)) mpg[carb==1] else NA_real_,
       qsec = mean(qsec)
 )
# A tibble: 3 x 3
#     cyl   mpg  qsec
#   <dbl> <dbl> <dbl>
#1     4  22.8  19.3
#2     6  21.4  17.1
#3     8  NA    16.2

However, it is better to assume that there could be more than one 'carb' values that are 1 for each 'cyl' and wrap it in a list, later unnest

mtcars %>%
    distinct(cyl, carb, .keep_all = TRUE) %>% 
    group_by(cyl) %>%
    summarise(
       mpg = list(if(any(carb == 1)) mpg[carb==1] else NA_real_),
       qsec = mean(qsec)) %>%
    unnest