0
votes

Take for example the mtcars data set. I would like to compare the ratio of mpg of cars that are grouped by cyl only , to the cars that are grouped by both cyl and carb. Problem is that grouping the dataset using dplyr creates one level of granularity which makes it impossible to compare to a different level of grouping. So what I did was create 2 new data sets, with each grouping, and then joined them together to compare the 2 means with a mutated column, as below. This worked, but it just seems like a a roundabout inefficient way to code . What is the proper way to do this? my code:

cyl_only <- mtcars %>% 
  group_by(cyl) %>% 
  summarise(cyl_only_mean= mean(mpg))
cyl_carb <- mtcars %>% 
  group_by(cyl,carb) %>% 
  summarise(cyl_carb_mean= mean(mpg))
cyl_carb_join <- cyl_only %>% 
  left_join(cyl_carb,by="cyl") 
mtcars_result <- mutate(cyl_carb_join,ratio= cyl_only_mean/cyl_carb_mean)
1

1 Answers

0
votes

You can accomplish this without doing the joining by bringing along the summary information needed to calculate a mean.

mtcars %>%
    group_by(cyl, carb) %>%
    summarise(sum_mpg = sum(mpg),
              count = n(),
              cyl_carb_mean = mean(mpg)) %>%
    group_by(cyl) %>%
    mutate(cyl_only_mean = sum(sum_mpg) / sum(count),
           ratio = cyl_only_mean/cyl_carb_mean)

    cyl  carb sum_mpg count cyl_carb_mean cyl_only_mean ratio
  <dbl> <dbl>   <dbl> <int>         <dbl>         <dbl> <dbl>
1     4     1   138.      5          27.6          26.7 0.967
2     4     2   155.      6          25.9          26.7 1.03 
3     6     1    39.5     2          19.8          19.7 1.00 
4     6     4    79       4          19.8          19.7 1.00 
5     6     6    19.7     1          19.7          19.7 1.00 
6     8     2    68.6     4          17.2          15.1 0.880
7     8     3    48.9     3          16.3          15.1 0.926
8     8     4    78.9     6          13.2          15.1 1.15 
9     8     8    15       1          15            15.1 1.01