I have to following issue using R. In short I want to create multiple new columns in a data frame based on calculations of different column pairs in the data frame.
The data looks as follows:
df <- data.frame(a1 = c(1:5),
b1 = c(4:8),
c1 = c(10:14),
a2 = c(9:13),
b2 = c(3:7),
c2 = c(15:19))
df
a1 b1 c1 a2 b2 c2
1 4 10 9 3 15
2 5 11 10 4 16
3 6 12 11 5 17
4 7 13 12 6 18
5 8 14 13 7 19
The output is supposed to look like the following:
a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
1 4 10 9 3 15 10 7 25
2 5 11 10 4 16 12 9 27
4 7 13 12 6 18 16 13 31
5 8 14 13 7 19 18 15 33
I can achieve this using dplyr doing some manual work in the following way:
df %>% rowwise %>% mutate(sum_a = sum(a1, a2),
sum_b = sum(b1, b2),
sum_c = sum(c1, c2)) %>%
as.data.frame()
So what is being done is: take columns with the letter "a" in it, calulate the sum rowwise, and create a new column with the sum named sum_[letter]. Repeat for columns with different letters.
This is working, however, if I have a large data set with say 300 different column pairs the manual input would be significant, since I would have to write 300 mutate calls.
I recently stumbled upon the R package "purrr" and my guess is that this would solve my problem of doing what I want in a more automated way.
In particular, I would think to be able to use purrr:map2 to which I pass two lists of column names.
- list1 = all columns with the number 1 in it
- list2 = all columns with the number 2 in it
Then I could calculate the sum of each matching list entry, in the form of:
map2(list1, list2, ~mutate(sum))
However, I am not able to figure out how to best approach this problem using purrr. I am rather new to using purrr, so I would really appreciate any help on this issue.
aa1, aa2, ab1, ab2
etc after you have 54 columns? – Stephen Hendersongroup_by
e.g.slice_by
??? – Stephen Henderson