35
votes

I want to calculate mean (or any other summary statistics of length one, e.g. min, max, length, sum) of a numeric variable ("value") within each level of a grouping variable ("group").

The summary statistic should be assigned to a new variable which has the same length as the original data. That is, each row of the original data should have a value corresponding to the current group value - the data set should not be collapsed to one row per group. For example, consider group mean:

Before

id  group  value
1   a      10
2   a      20
3   b      100
4   b      200

After

id  group  value  grp.mean.values
1   a      10     15
2   a      20     15
3   b      100    150
4   b      200    150
4

4 Answers

29
votes

You may do this in dplyr using mutate:

library(dplyr)
df %>%
  group_by(group) %>%
  mutate(grp.mean.values = mean(value))

...or use data.table to assign the new column by reference (:=):

library(data.table)
setDT(df)[ , grp.mean.values := mean(value), by = group]
27
votes

Have a look at the ave function. Something like

df$grp.mean.values <- ave(df$value, df$group)

If you want to use ave to calculate something else per group, you need to specify FUN = your-desired-function, e.g. FUN = min:

df$grp.min <- ave(df$value, df$group, FUN = min)
7
votes

One option is to use plyr. ddply expects a data.frame (the first d) and returns a data.frame (the second d). Other XXply functions work in a similar way; i.e. ldply expects a list and returns a data.frame, dlply does the opposite...and so on and so forth. The second argument is the grouping variable(s). The third argument is the function we want to compute for each group.

require(plyr)
ddply(dat, "group", transform, grp.mean.values = mean(value))

  id group value grp.mean.values
1  1     a    10              15
2  2     a    20              15
3  3     b   100             150
4  4     b   200             150
3
votes

Here is another option using base functions aggregate and merge:

merge(x, aggregate(value ~ group, data = x, mean), 
     by = "group", suffixes = c("", "mean"))

  group id value.x value.y
1     a  1      10      15
2     a  2      20      15
3     b  3     100     150
4     b  4     200     150

You can get "better" column names with suffixes:

merge(x, aggregate(value ~ group, data = x, mean), 
     by = "group", suffixes = c("", ".mean"))


  group id value value.mean
1     a  1    10         15
2     a  2    20         15
3     b  3   100        150
4     b  4   200        150