2
votes

I've got the following code where I need to grab column names from a variable then perform an operation selectively on rows, using the specified column in that operation. Here is my simple example, creating column res to match column target:

library(tidyverse)

tst <- tibble(grp = c("a","a","b","b","c","c"), a = rep(2,6), b = rep(4,6), 
              c = rep(8,6), target = c(2,2,4,4,8,8), res = rep(0,6))

# create list of columns to iterate over
lst <-  unique(tst$grp)

# loop over each column, creating a dummy column with 
# the copied value in the matching rows, zeroes elsewhere
for(g in lst) {

  tst <- tst %>% 
    mutate(!!paste("res", g, sep="_") := ifelse(grp == g, !!rlang::sym(g),0)) %>% 
    select(!!paste("res", g, sep="_")) %>% 
    cbind(tst)
}

# combine the dummy columns by rowSum
res <- tst %>% select(starts_with("res_")) %>% mutate(res = rowSums(.)) %>% 
select(res)

# tidy up the output, result matches the target
tst <- tst %>% select(grp, a, b, c, target) %>% cbind(res)

tst

  grp a b c target res
1   a 2 4 8      2   2
2   a 2 4 8      2   2
3   b 2 4 8      4   4
4   b 2 4 8      4   4
5   c 2 4 8      8   8
6   c 2 4 8      8   8

I've taken an iterative approach, looping through the unique variables in the grp column, creating temporary columns then rowSum()ing these to get the final result. Clunky, but got there in the end.

I'm sure there is a more elegant way to do it with one of the map family from purrr. Can someone show me how I can do this without the loop using purrr? I really struggled to get the dynamic column name bit working using this approach. Thanks in advance.

4

4 Answers

0
votes

You can use imap, which iterates over column values and their names. The column values are the values of grp, the names is just the sequence 1,...,6.

In addition, you have to provide the data frame itself as an additional argument (df= to imap, which it forwards to it's function argument. In total:

tst %>% 
  mutate(res = purrr::imap_dbl(grp, df = ., 
    .f = function(g, i, df) df[i,g][[1]] # [[1]] turns the result from tibble into a double
  )) 

Edit: I timed this solution with a larger table:

tst <- tst[sample(nrow(tst), 50000, TRUE),]

and it takes about 50s.

1
votes

Something that does not require you to write a loop

library(tidyverse)

tst <- tibble(grp = c("a","a","b","b","c","c"), a = rep(2,6), b = rep(4,6), 
              c = rep(8,6), target = c(2,2,4,4,8,8), res = rep(0,6))

tst %>% 
  mutate(res = 
           case_when(
             grp == "a" ~ a,
             grp == "b" ~ b,
             grp == "c" ~ c
           ))

# A tibble: 6 x 6
  grp       a     b     c target   res
  <chr> <dbl> <dbl> <dbl>  <dbl> <dbl>
1 a         2     4     8      2     2
2 a         2     4     8      2     2
3 b         2     4     8      4     4
4 b         2     4     8      4     4
5 c         2     4     8      8     8
6 c         2     4     8      8     8

Note: instead of ~ a you can use your own formula if needed.
For more help see ?case_when

0
votes

This is a base R solution, which is also not longer:

# Save all source columns in a matrix. This enables indexing by another matrix
x <- as.matrix(tst[, unique(tst$grp)])
# Matrix of (row, column) pairs to extract from x
i <- cbind(seq_len(nrow(tst)), match(tst$grp, colnames(x)))
tst$res <- x[i]

Edit: Elapsed time for a larger table:

tst <- tst[sample(nrow(tst), 50000, TRUE), ]

0.008s -- 0.015s

0
votes

Perhaps:

tst %>% 
  mutate(res = sapply(seq(nrow(tst)), function(x) tst[x,as.character(tst$grp[x])]))


# A tibble: 6 x 6
    grp     a     b     c target   res
  <chr> <dbl> <dbl> <dbl>  <dbl> <dbl>
1     a     2     4     8      2     2
2     a     2     4     8      2     2
3     b     2     4     8      4     4
4     b     2     4     8      4     4
5     c     2     4     8      8     8
6     c     2     4     8      8     8