weighted mean in dplyr for multiple columns

Question

I'm trying to calculate the weighted mean for multiple columns using dplyr. at the moment I'm stuck with summarize_each which to me seems to be part of the solution. here's some example code:

library(dplyr)
f2a <- c(1,0,0,1)
f2b <- c(0,0,0,1)
f2c <- c(1,1,1,1)
clustervar <- c("A","B","B","A")
weight <- c(10,20,30,40)

df <- data.frame (f2a, f2b, f2c, clustervar, weight, stringsAsFactors=FALSE)
df

what I am looking for is something like

df %>%
  group_by (clustervar) %>%
  summarise_each(funs(weighted.mean(weight)), select=cbind(clustervar, f2a:f2c))

The result of this is only:

# A tibble: 2 × 4
  clustervar select4 select5 select6
       <chr>   <dbl>   <dbl>   <dbl>
1          A      25      25      25
2          B      25      25      25

What am I missing here?

alistaire alistaire · Accepted Answer · 2017-04-25T06:23:14

You can use summarise_at to specify which columns you want to operate on:

df %>% group_by(clustervar) %>% 
    summarise_at(vars(starts_with('f2')), 
                 funs(weighted.mean(., weight)))
#> # A tibble: 2 × 4
#>   clustervar   f2a   f2b   f2c
#>        <chr> <dbl> <dbl> <dbl>
#> 1          A     1   0.8     1
#> 2          B     0   0.0     1

weighted mean in dplyr for multiple columns

2 Answers