In dplyr, group_by has a parameter add
, and if it's true, it adds to the group_by. For example:
data <- data.frame(a=c('a','b','c'), b=c(1,2,3), c=c(4,5,6))
data <- data %>% group_by(a, add=TRUE)
data <- data %>% group_by(b, add=TRUE)
data %>% summarize(sum_c = sum(c))
Output:
a b sum_c
1 a 1 4
2 b 2 5
3 c 3 6
Is there an analogous way to add summary variables to a summarize statement? I have some complicated conditionals (with dbplyr
) where if x=TRUE
I want to add
variable x_v
to the summary.
I see several related stackoverflow questions, but I didn't see this.
EDIT: Here is some precise example code, but simplified from the real code (which has more than two conditionals).
summarize_num <- TRUE
summarize_num_distinct <- FALSE
data <- data.frame(val=c(1,2,2))
if (summarize_num && summarize_num_distinct) {
summ <- data %>% summarize(n=n(), n_unique=n_distinct())
} else if (summarize_num) {
summ <- data %>% summarize(n=n())
} else if (summarize_num_distinct) {
summ <- data %>% summarize(n_unique=n_distinct())
}
Depending on conditions (summarize_num, and summarize_num_distinct here), the eventual summary (summ here) has different columns.
As the number of conditions goes up, the number of clauses goes up combinatorially. However, the conditions are independent, so I'd like to add the summary variables independently as well.
I'm using dbplyr
, so I have to do it in a way that it can get translated into SQL.
dplyr::mutate
instead ofsummarise
work, i.e. you get the per group effect and can then filter based on any condition? – biomiha