2
votes

Suppose I have a grouped data frame:

> mtcars %>% 
+   group_by(cyl) %>% 
+   summarise(blah = mean(disp))
# A tibble: 3 x 2
    cyl  blah
  <dbl> <dbl>
1     4  105.
2     6  183.
3     8  353.

Then suppose I want to sum some existing variables:

> mtcars %>% 
+   group_by(cyl) %>% 
+   summarise_at(vars(vs:carb), sum)
# A tibble: 3 x 5
    cyl    vs    am  gear  carb
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     4    10     8    45    17
2     6     4     3    27    24
3     8     0     2    46    49

However, if I want to add both summarise commands together, I cannot:

> mtcars %>% 
+   group_by(cyl) %>% 
+   summarise_at(vars(vs:carb), sum) %>% 
+   summarise(blah = mean(disp))
Error in mean(disp) : object 'disp' not found

After using group_by() in a dplyr chain, Hhow can I add new features with summarise() as well as summing existing features as above with summarise_at(vars(vs:carb), sum)?

4

4 Answers

3
votes

The only way I can think of (at the moment) is the store the data immediately before your first summary, then run two summary verbs, and join them on the grouped variable. For instance:

library(dplyr)

grouped_data <- group_by(mtcars, cyl)
left_join(
  summarize(grouped_data, blah = mean(disp)),
  summarize_at(grouped_data, vars(vs:carb), sum),
  by = "cyl")
# # A tibble: 3 x 6
#     cyl  blah    vs    am  gear  carb
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1     4  105.    10     8    45    17
# 2     6  183.     4     3    27    24
# 3     8  353.     0     2    46    49
1
votes

You can left_join with the dataframe resulting from the summarise.

library(dplyr)

data(mtcars)

mtcars %>% 
  group_by(cyl) %>% 
  summarise_at(vars(vs:carb), sum) %>% 
  left_join(mtcars %>% group_by(cyl) %>% summarise(blah = mean(disp)))
#Joining, by = "cyl"
## A tibble: 3 x 6
#    cyl    vs    am  gear  carb  blah
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1     4    10     8    45    17  105.
#2     6     4     3    27    24  183.
#3     8     0     2    46    49  353.
1
votes

What I would do is use mutate_at for first step so that other columns are not collapsed and then use summarise_at with mean for all the columns together.

library(dplyr) 

mtcars %>% 
   group_by(cyl) %>% 
   mutate_at(vars(vs:carb), sum) %>%
   summarise_at(vars(vs:carb, disp), mean)

#    cyl    vs    am  gear  carb  disp
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1     4    10     8    45    17  105.
#2     6     4     3    27    24  183.
#3     8     0     2    46    49  353.
0
votes

Here's a way, we need to define an helper function first and it works only in a pipe chain and uses unexported functions from dplyr though so might break one day.

.at <- function(.vars, .funs, ...) {
  # make sure we are in a piped call
  in_a_piped_fun <- exists(".",parent.frame()) &&
    length(ls(envir=parent.frame(), all.names = TRUE)) == 1
  if (!in_a_piped_fun)
    stop(".at() must be called as an argument to a piped function")
  # borrow code from summarize_at
  .tbl <- try(eval.parent(quote(.)))
  dplyr:::manip_at(
    .tbl, .vars, .funs, rlang::enquo(.funs), rlang:::caller_env(),
    .include_group_vars = TRUE, ...)
}

library(dplyr, warn.conflicts = FALSE)
mtcars %>%
  summarize(!!!.at(vars(vs:carb), sum),  blah = mean(disp))
#>   vs am gear carb     blah
#> 1 14 13  118   90 230.7219

Created on 2019-11-17 by the reprex package (v0.3.0)