Conditional count and mean by grouped data without filter or left_join

Question

How can I calculate the conditional count and mean by grouped data without filter using dplyr? I neither want to shrink the dataset using filter nor want to left_join the data to get the original sized (lengthed) dataset. Is that possible?

Here is a case:

mtcars %>% filter(cyl>4, gear<5) %>% group_by(gear, carb, am) %>% mutate(avg_mpg = mean(mpg, na.rm = T), cnt = n())

How can I get the above results without using filter or left_join?

Did you meant mtcars %>% group_by(gear, carb, am) %>% mutate(avg_mpg = mean(mpg[cyl>4 & gear< 5], na.rm = TRUE)) — akrun
@ckrun Can you include count to your suggested answer please? — Geet
Yes, mtcars %>% group_by(gear, carb, am) %>% mutate(avg_mpg = mean(mpg[cyl>4 & gear< 5], na.rm = TRUE), cnt= n()) — akrun

akrun akrun · Accepted Answer · 2018-06-25T18:54:00

We can create the condition as a logical column and use that to subset the 'mpg' by group to get the mean value, and similarly the 'cnt' is calculated as sum of the logical vector

mtcars %>%
       mutate(ind = cyl > 4 & gear < 5) %>%
       group_by(gear, carb, am) %>%
       mutate(avg_mpg = mean(mpg[ind], na.rm= TRUE),
              cnt = sum(ind)) %>%
       select(-ind)

NOTE: The 'ind' column is created only because we are using the conditions multiple times.

Conditional count and mean by grouped data without filter or left_join

1 Answers