1
votes

Bare with me... I am using the R/RStudio with the data mtcars, dplyr , mutate and the summarise commands. Also tried group by.

I want to center the values mtcars$mpg then take that info and display the summary of the number of cylinders vs centered mtcars$mpg.

So far...

mtcars %>% mutate(centered_mpg = mpg - mean(mpg, na.rm = TRUE)) %>% summarise(centered_mpg, cyl)

The above produces:

centered_mpg cyl
0.909375 6
0.909375 6
2.709375 4
1.309375 6
... ...

INSTEAD, I WANT:

centered_mpg cyl
x1 4
x2 6
x3 8
2
How do you want to summarize the centered mpg for each group of cars (for each cyl value)? Do you want the mean centered mpg by cyl? Or median, sum, something else? I imagine something like mtcars %>% mutate(centered_mpg = mpg - mean(mpg, na.rm = TRUE)) %>% group_by(cyl) %>% summarise(mean_centered_mpg = mean(centered_mpg)), but swap out the mean() inside summarize() for whatever function you'd like. - Gregor Thomas
I think that's the ticket. - oaxacamatt

2 Answers

2
votes

Are you looking for this?

with(mtcars, aggregate(list(centered_mpg=scale(mpg, scale=FALSE)), list(cyl=cyl), mean))
#   cyl centered_mpg
# 1   4    6.5730114
# 2   6   -0.3477679
# 3   8   -4.9906250
2
votes

It looks like you want to center each individual car's mpg by subtracting the global mean(mpg). This gives a centered_mpg for every car - and the code you have looks fine for this.

Then you want to calculate some sort of "summary" of the centered mpg values by cylinder group, so we need to group_by(cyl) and then define whatever summary function you want - here I use mean() but you can use median, sum, or whatever else you'd like.

mtcars %>% 
  mutate(centered_mpg = mpg - mean(mpg, na.rm = TRUE)) %>% 
  group_by(cyl) %>% 
  summarise(mean_centered_mpg = mean(centered_mpg))
# # A tibble: 3 x 2
#     cyl mean_centered_mpg
#   <dbl>             <dbl>
# 1     4             6.57 
# 2     6            -0.348
# 3     8            -4.99