0
votes

Using the mpg data set, I want to group by displacement (2.0 and 3.1).

  • Next sum all the cty of the 2.0 group.
  • Lastly in a final column divide the sum above with the sum of all the cty for 2.0 and 3.1.

So far I've only been able do the grouping without error:

data(mpg)

mpg2 <- filter(mpg, manufacturer == "audi" & year == 2008 & cyl < 8)

x <- group_by(mpg2, displ) 
    # %>% mutate(total_cty = {sum(.$cty)}) #new column getting the total of cty for each group (2.0, 3.1)
    # proportion = total_cty/total_cty_of.2.0 + total_cty_of.3.1

I know group_by alone doesn't change the appearance, except when you use aggregates on it like summarise. I would like to be able to see the new result though if possible.

1

1 Answers

1
votes

Don't use $ in dplyr pipes, very rarely they are useful. When you use that the groupings are lost.

We can calculate sum cty for each displ value and then calculate their proportion.

library(dplyr)

mpg2 %>%
  group_by(displ) %>%
  summarise(cty = sum(cty)) %>%
  mutate(cty_prop = cty/sum(cty))

# displ   cty cty_prop
#  <dbl> <int>    <dbl>
#1   2      80    0.544
#2   3.1    67    0.456