Summarize values by group, but keep original data

Question

I am trying to figure out how to sum values belonging to category a and b by factor file, but also keep the original data.

library(dplyr)
df <- data.frame(ID = 1:20, values = runif(20), category = rep(letters[1:5], 4), file = as.factor(sort(rep(1:5, 4)))) 


   ID     values category file
1   1 0.65699229        a    1
2   2 0.70506478        b    1
3   3 0.45774178        c    1
4   4 0.71911225        d    1
5   5 0.93467225        e    1
6   6 0.25542882        a    2
7   7 0.46229282        b    2
8   8 0.94001452        c    2
9   9 0.97822643        d    2
10 10 0.11748736        e    2
11 11 0.47499708        a    3
12 12 0.56033275        b    3
13 13 0.90403139        c    3
14 14 0.13871017        d    3
15 15 0.98889173        e    3
16 16 0.94666823        a    4
17 17 0.08243756        b    4
18 18 0.51421178        c    4
19 19 0.39020347        d    4
20 20 0.90573813        e    4

so that

df[1,2] will be added to df[2,2] to category 'ab' for file 1
df[6,2] will be added to df[7,2] to category 'ab' for file 2
etc.

So far I have this:

df %>% 
    filter(category %in% c('a' , 'b')) %>%
    group_by(file) %>% 
    summarise(values = sum(values))

Problem

I would like to change the category of the summed values to "ab" and append it to the original data frame in the same pipeline.

Desired output:

   ID     values category file
1   1 0.65699229        a    1
2   2 0.70506478        b    1
3   3 0.45774178        c    1
4   4 0.71911225        d    1
5   5 0.93467225        e    1
6   6 0.25542882        a    2
7   7 0.46229282        b    2
8   8 0.94001452        c    2
9   9 0.97822643        d    2
10 10 0.11748736        e    2
11 11 0.47499708        a    3
12 12 0.56033275        b    3
13 13 0.90403139        c    3
14 14 0.13871017        d    3
15 15 0.98889173        e    3
16 16 0.94666823        a    4
17 17 0.08243756        b    4
18 18 0.51421178        c    4
19 19 0.39020347        d    4
20 20 0.90573813        e    4
21 21 1.25486225       ab    1
22 22 1.87216325       ab    2
23 23 1.36548126       ab    3

Maybe setDT(df)[category %chin% c(‘a’,’b’), summed:=sum(values), file] — chinsoon12
@chinsoon12 %chin% actually won't work here (assuming default stringsAsFactor option) because category is a factor and not character. You should post this as an answer though — IceCreamToucan

Daniel Fischer Daniel Fischer · Accepted Answer · 2019-07-10T00:07:35

This will get you the result

df %>% bind_rows(
  df %>% 
    filter(category %in% c('a' , 'b')) %>%
    group_by(file) %>% 
    mutate(values = sum(values), category = paste0(category,collapse='')) %>% 
    filter(row_number() == 1 & n() > 1)
) %>% mutate(ID = row_number())

BTW the code pro produce the dataframe in the example is this one:

df <- data.frame(ID = 1:20, values = runif(20), category = rep(letters[1:5], 4), file = as.factor(sort(rep(1:4, 5))))

now lets say you want to sum multiple columns, you need to provide the list in a vector:

cols = c("values") # columns to be sum

df %>% bind_rows(
  df %>% 
    filter(category %in% c('a' , 'b')) %>%
    group_by(file) %>% 
    mutate_at(vars(cols), sum) %>% 
    mutate(category = paste0(category,collapse='')) %>% 
    filter(row_number() == 1 & n() > 1)
) %>% mutate(ID = row_number())

Summarize values by group, but keep original data

Problem

3 Answers