3
votes

I have a dataframe (tibble) with multiple columns, and for each column after the first two I'd like to keep the absolute value, but also insert the value relative to the second column. For example, I start with the following data frame (the column names might vary!):

df = tibble(val1 = 5:10, val2 = 10:15, val3 = 15:20); df
# A tibble: 6 x 3
   val1  val2  val3
  <int> <int> <int>
1     5    10    15
2     6    11    16
3     7    12    17
4     8    13    18
5     9    14    19
6    10    15    20

Now, for the val2 and val3 columns, I also want to insert a column right after val2 and after val3 showing the value relative to val1. How can I do that???

The resulting tibble should look like:

dfrel = tibble(val1 = 5:10, val2 = 10:15, rel2 = val2/val1, val3 = 15:20, rel3 = val3/val1)
dfrel
# A tibble: 6 x 5
   val1  val2  rel2  val3  rel3
  <int> <int> <dbl> <int> <dbl>
1     5    10  2.00    15  3.00
2     6    11  1.83    16  2.67
3     7    12  1.71    17  2.43
4     8    13  1.62    18  2.25
5     9    14  1.56    19  2.11
6    10    15  1.50    20  2.00

Unfortunately, I'm unable to write the proper mutate_at call to insert that relative column right after each value column. In fact, I'm unable to write a mutate_at using funs() that modifies a column by accessing other columns (by position rather than name).

Replacing val2 and val3 by the relative values works (using a lambda function instead of funs), but does not preserve the original val2 and val3 columns as required:

df %>%
     mutate_at(vars(-1), function(v) v/.[[1]])
# A tibble: 6 x 3
   val1  val2  val3
  <int> <dbl> <dbl>
1     5  2.00  3.00
2     6  1.83  2.67
3     7  1.71  2.43
4     8  1.62  2.25
5     9  1.56  2.11
6    10  1.50  2.00

All my attempts to use funs() fail:

df %>%
     mutate_at(vars(-1), funs(./.tbl[[1]]))
Error in mutate_impl(.data, dots) : 
  Evaluation error: object '.tbl' not found.

df %>%
     mutate_at(vars(-1), funs(function(v) v/.[[1]]))
Error in mutate_impl(.data, dots) : 
  Column `val2` is of unsupported type function

One complication compared to Using functions of multiple columns in a dplyr mutate_at call is that my val1 column does not have a fixed name (i.e. it is not always called val1), so I cannot use it by name in the funs arguments. The other complication is that the tibble is created on the fly (using lots of pipe operators) and is typically not stored in a variable, so I cannot simply divide by df[[1]]...

So, what is the proper dplyr approach to insert relative columns (i.e. percent of the first column) after each column?

2

2 Answers

1
votes

Give your function a name by wrapping it in a list, so mutate_at will create new columns. Something like following (the column names may not be as ideal so you might need to rename them if needed):

df %>% mutate_at(vars(-1), list(rel = function(v) v / .[[1]]))

# A tibble: 6 x 5
#   val1  val2  val3 val2_rel val3_rel
#  <int> <int> <int>    <dbl>    <dbl>
#1     5    10    15     2.00     3.00
#2     6    11    16     1.83     2.67
#3     7    12    17     1.71     2.43
#4     8    13    18     1.62     2.25
#5     9    14    19     1.56     2.11
#6    10    15    20     1.50     2.00
0
votes

With Psidom's help, this is my final solution to the problem:

interleaveColumns = function(v) { 
    c(1, unlist(split(2:length(v), 1:((length(v)-1)/2)), use.names = FALSE)) 
}

df = tibble(val1 = 5:10, val2 = 10:15, val3 = 15:20, val4 = 25:30, val5 = 1:6);

# mutate_at can be given a named list to create a new column 
# for each existing columnt (appended to the end => we need 
# to reorder the columns and interleave the new columns with 
# the old columns using the interleaveColumns function)

df %>%
     mutate_at(vars(-1), list(rel = function(v) v/.[[1]])) %>% 
     select(interleaveColumns(.))

# A tibble: 6 x 9
   val1  val2 val2_rel  val3 val3_rel  val4 val4_rel  val5 val5_rel
  <int> <int>    <dbl> <int>    <dbl> <int>    <dbl> <int>    <dbl>
1     5    10     2.00    15     3.00    25     5.00     1    0.200
2     6    11     1.83    16     2.67    26     4.33     2    0.333
3     7    12     1.71    17     2.43    27     3.86     3    0.429
4     8    13     1.62    18     2.25    28     3.50     4    0.500
5     9    14     1.56    19     2.11    29     3.22     5    0.556
6    10    15     1.50    20     2.00    30     3.00     6    0.600