Summarize with dplyr “other then” groups

Question

I need to summarize in a grouped data_frame (warn: a solution with dplyr is very much appreciated but isn't mandatory) both something on each group (simple) and the same something on "other" groups.

minimal example

if(!require(pacman)) install.packages(pacman)
pacman::p_load(dplyr)

df <- data_frame(
    group = c('a', 'a', 'b', 'b', 'c', 'c'),
    value = c(1, 2, 3, 4, 5, 6)
)

res <- df %>%
    group_by(group) %>%
    summarize(
        median        = median(value)
#        median_other  = ... ??? ... # I need the median of all "other"
                                     # groups
#        median_before = ... ??? ... # I need the median of groups (e.g
                                 #    the "before" in alphabetic order,
                                 #    but clearly every roule which is
                                 #    a "selection function" depending
                                 #    on the actual group is fine)
    )

my expected result is the following

group    median    median_other    median_before
  a        1.5         4.5               NA
  b        3.5         3.5               1.5
  c        5.5         2.5               2.5

I've searched on Google strings similar to "dplyr summarize excluding groups", "dplyr summarize other then group",I've searched on the dplyr documentation but I wasn't able to find a solution.

here, this (How to summarize value not matching the group using dplyr) does not apply because it runs only on sum, i.e. is a solution "function-specific" (and with a simple arithmetic function that did not consider the variability on each group). What about more complex function request (i.e. mean, sd, or user-function)? :-)

Thanks to all

PS: summarize() is an example, the same question leads to mutate() or other dplyr-functions working based on groups.

You can't just use library(dplyr) instead of the first two lines? — Rich Scriven
If dplyr isn't installed on your system library(dplyr) return an error, so to be sure that anyone can run the code I had to write 2 line of code anyway and I decide to use pacman instead, which is a very usefull package in may opinion (because you can load (and install if needed) many package at the same time with just those two line of code) — Corrado

donlelek donlelek · Accepted Answer · 2016-04-06T22:35:41

Here's my solution:

res <- df %>%
  group_by(group) %>%
  summarise(med_group = median(value),
            med_other = (median(df$value[df$group != group]))) %>% 
  mutate(med_before = lag(med_group))

> res
Source: local data frame [3 x 4]

      group med_group med_other med_before
  (chr)     (dbl)     (dbl)      (dbl)
1     a       1.5       4.5         NA
2     b       3.5       3.5        1.5
3     c       5.5       2.5        3.5

I was trying to come up with an all-dplyr solution but base R subsetting works just fine with median(df$value[df$group != group]) returning the median of all observations that are not in the current group.

I hope this help you to solve your problem.

Summarize with dplyr “other then” groups

2 Answers