1
votes

How can I calculate the conditional count and mean by grouped data without filter using dplyr? I neither want to shrink the dataset using filter nor want to left_join the data to get the original sized (lengthed) dataset. Is that possible?

Here is a case:

mtcars %>% filter(cyl>4, gear<5) %>% group_by(gear, carb, am) %>% mutate(avg_mpg = mean(mpg, na.rm = T), cnt = n())

How can I get the above results without using filter or left_join?

1
Did you meant mtcars %>% group_by(gear, carb, am) %>% mutate(avg_mpg = mean(mpg[cyl>4 & gear< 5], na.rm = TRUE))akrun
@ckrun Can you include count to your suggested answer please?Geet
Yes, mtcars %>% group_by(gear, carb, am) %>% mutate(avg_mpg = mean(mpg[cyl>4 & gear< 5], na.rm = TRUE), cnt= n())akrun
Is this what you want?\akrun
In that case, cnt = sum(cyl>4 & gear <5))akrun

1 Answers

1
votes

We can create the condition as a logical column and use that to subset the 'mpg' by group to get the mean value, and similarly the 'cnt' is calculated as sum of the logical vector

mtcars %>%
       mutate(ind = cyl > 4 & gear < 5) %>%
       group_by(gear, carb, am) %>%
       mutate(avg_mpg = mean(mpg[ind], na.rm= TRUE),
              cnt = sum(ind)) %>%
       select(-ind)

NOTE: The 'ind' column is created only because we are using the conditions multiple times.