How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs

Question

I have to following issue using R. In short I want to create multiple new columns in a data frame based on calculations of different column pairs in the data frame.

The data looks as follows:

df <- data.frame(a1 = c(1:5), 
                 b1 = c(4:8), 
                 c1 = c(10:14), 
                 a2 = c(9:13), 
                 b2 = c(3:7), 
                 c2 = c(15:19))
df
a1 b1 c1 a2 b2 c2
1  4 10  9  3 15
2  5 11 10  4 16
3  6 12 11  5 17
4  7 13 12  6 18
5  8 14 13  7 19

The output is supposed to look like the following:

a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
1  4 10  9  3 15    10     7    25
2  5 11 10  4 16    12     9    27
4  7 13 12  6 18    16    13    31
5  8 14 13  7 19    18    15    33

I can achieve this using dplyr doing some manual work in the following way:

df %>% rowwise %>% mutate(sum_a = sum(a1, a2),
                          sum_b = sum(b1, b2),
                          sum_c = sum(c1, c2)) %>% 
  as.data.frame()

So what is being done is: take columns with the letter "a" in it, calulate the sum rowwise, and create a new column with the sum named sum_[letter]. Repeat for columns with different letters.

This is working, however, if I have a large data set with say 300 different column pairs the manual input would be significant, since I would have to write 300 mutate calls.

I recently stumbled upon the R package "purrr" and my guess is that this would solve my problem of doing what I want in a more automated way.

In particular, I would think to be able to use purrr:map2 to which I pass two lists of column names.

list1 = all columns with the number 1 in it
list2 = all columns with the number 2 in it

Then I could calculate the sum of each matching list entry, in the form of:

map2(list1, list2, ~mutate(sum))

However, I am not able to figure out how to best approach this problem using purrr. I am rather new to using purrr, so I would really appreciate any help on this issue.

Are the column names going to become ... aa1, aa2, ab1, ab2 etc after you have 54 columns? — Stephen Henderson
I see the answer has been edited to reflect the above query. On the lack of a tidy solution... I think maybe there could be something like the transpose of group_by e.g. slice_by ??? — Stephen Henderson
Thank you all very much. I used a classic tidyverse approach of group_by, gather, spread, and summing up on the way (very similar to what was proposed below by "Lorenzo G" and "G. Grothendieck" in answer #1). I never worked with slice_by but I guess that would work nicely as well. I wanted to use a map approach to make the code even shorter and standardized, and the solution proposed by "akrun" perfectly fits that need. Thank you again! — user30276

akrun akrun · Accepted Answer · 2018-04-13T12:42:12

Here is one option with purrr. We get the unique prefix of the names of the dataset ('nm1'), use map (from purrr) to loop through the unique names, select the column that matches the prefix value of 'nm1', add the rows using reduce and the bind the columns (bind_cols) with the original dataset

library(tidyverse)
nm1 <- names(df) %>% 
          substr(1, 1) %>%
          unique 
nm1 %>% 
     map(~ df %>% 
            select(matches(.x)) %>%
            reduce(`+`)) %>%
            set_names(paste0("sum_", nm1)) %>%
     bind_cols(df, .)
#    a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
#1  1  4 10  9  3 15    10     7    25
#2  2  5 11 10  4 16    12     9    27
#3  3  6 12 11  5 17    14    11    29
#4  4  7 13 12  6 18    16    13    31
#5  5  8 14 13  7 19    18    15    33

How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs

8 Answers