Dplyr & purrr with dynamic column names, select & copy by group

Question

I've got the following code where I need to grab column names from a variable then perform an operation selectively on rows, using the specified column in that operation. Here is my simple example, creating column res to match column target:

library(tidyverse)

tst <- tibble(grp = c("a","a","b","b","c","c"), a = rep(2,6), b = rep(4,6), 
              c = rep(8,6), target = c(2,2,4,4,8,8), res = rep(0,6))

# create list of columns to iterate over
lst <-  unique(tst$grp)

# loop over each column, creating a dummy column with 
# the copied value in the matching rows, zeroes elsewhere
for(g in lst) {

  tst <- tst %>% 
    mutate(!!paste("res", g, sep="_") := ifelse(grp == g, !!rlang::sym(g),0)) %>% 
    select(!!paste("res", g, sep="_")) %>% 
    cbind(tst)
}

# combine the dummy columns by rowSum
res <- tst %>% select(starts_with("res_")) %>% mutate(res = rowSums(.)) %>% 
select(res)

# tidy up the output, result matches the target
tst <- tst %>% select(grp, a, b, c, target) %>% cbind(res)

tst

  grp a b c target res
1   a 2 4 8      2   2
2   a 2 4 8      2   2
3   b 2 4 8      4   4
4   b 2 4 8      4   4
5   c 2 4 8      8   8
6   c 2 4 8      8   8

I've taken an iterative approach, looping through the unique variables in the grp column, creating temporary columns then rowSum()ing these to get the final result. Clunky, but got there in the end.

I'm sure there is a more elegant way to do it with one of the map family from purrr. Can someone show me how I can do this without the loop using purrr? I really struggled to get the dynamic column name bit working using this approach. Thanks in advance.

akraf akraf · Accepted Answer · 2018-06-05T09:50:03

You can use imap, which iterates over column values and their names. The column values are the values of grp, the names is just the sequence 1,...,6.

In addition, you have to provide the data frame itself as an additional argument (df= to imap, which it forwards to it's function argument. In total:

tst %>% 
  mutate(res = purrr::imap_dbl(grp, df = ., 
    .f = function(g, i, df) df[i,g][[1]] # [[1]] turns the result from tibble into a double
  ))

Edit: I timed this solution with a larger table:

tst <- tst[sample(nrow(tst), 50000, TRUE),]

and it takes about 50s.

Dplyr & purrr with dynamic column names, select & copy by group

4 Answers