5
votes

How can I compute the weighted average of all the fields in a dataset using summarise_each in dplyr? For example, let's say we want to group the mtcars dataset by cyl and compute the weighted average of all columns where the weights are taken as the gear column. I've tried the following but could not get it to work.

mtcars %>% group_by(cyl) %>% summarise_each(funs(weighted.mean(., gear)))

# The line above gives the following output
# Error in weighted.mean.default(c(1, 2, 2, 1, 2, 1, 1, 1, 2, 2, 2), 4.15555555555556) : 
# 'x' and 'w' must have the same length

Thanks in advance for your help!

1

1 Answers

15
votes

To help see what's going on here. lets make a little function that returns the lengths of it's arguments

lenxy <- function(x,y)
    paste0(length(x),'-',length(y))

and then apply it in summarise_each, as in:

mtcars %>% group_by(cyl) %>% summarise_each(funs(lenxy(., qsec)))

#>   cyl   mpg  disp    hp  drat    wt  qsec   vs   am gear carb
#> 1   4 11-11 11-11 11-11 11-11 11-11 11-11 11-1 11-1 11-1 11-1
#> 2   6   7-7   7-7   7-7   7-7   7-7   7-7  7-1  7-1  7-1  7-1
#> 3   8 14-14 14-14 14-14 14-14 14-14 14-14 14-1 14-1 14-1 14-1

Looking at this table, you can see that the lengths of the first and second arguments are the same up until qseq and then afterword the second argument to lenxy has length 1, which is the result of the fact that dplyr does operates on the data in place, replacing each field with it's summary, rather than creating a new data.fame.

The solution is easy: exclude the weighting variable from the summary:

mtcars %>% 
    group_by(cyl) %>% 
    summarise_each(funs(weighted.mean(., gear)),
                   -gear)