I have a dataframe (tibble) with multiple columns, and for each column after the first two I'd like to keep the absolute value, but also insert the value relative to the second column. For example, I start with the following data frame (the column names might vary!):
df = tibble(val1 = 5:10, val2 = 10:15, val3 = 15:20); df
# A tibble: 6 x 3
val1 val2 val3
<int> <int> <int>
1 5 10 15
2 6 11 16
3 7 12 17
4 8 13 18
5 9 14 19
6 10 15 20
Now, for the val2
and val3
columns, I also want to insert a column right after val2
and after val3
showing the value relative to val1
. How can I do that???
The resulting tibble should look like:
dfrel = tibble(val1 = 5:10, val2 = 10:15, rel2 = val2/val1, val3 = 15:20, rel3 = val3/val1)
dfrel
# A tibble: 6 x 5
val1 val2 rel2 val3 rel3
<int> <int> <dbl> <int> <dbl>
1 5 10 2.00 15 3.00
2 6 11 1.83 16 2.67
3 7 12 1.71 17 2.43
4 8 13 1.62 18 2.25
5 9 14 1.56 19 2.11
6 10 15 1.50 20 2.00
Unfortunately, I'm unable to write the proper mutate_at call to insert that relative column right after each value column. In fact, I'm unable to write a mutate_at using funs() that modifies a column by accessing other columns (by position rather than name).
Replacing val2 and val3 by the relative values works (using a lambda function instead of funs), but does not preserve the original val2 and val3 columns as required:
df %>%
mutate_at(vars(-1), function(v) v/.[[1]])
# A tibble: 6 x 3
val1 val2 val3
<int> <dbl> <dbl>
1 5 2.00 3.00
2 6 1.83 2.67
3 7 1.71 2.43
4 8 1.62 2.25
5 9 1.56 2.11
6 10 1.50 2.00
All my attempts to use funs()
fail:
df %>%
mutate_at(vars(-1), funs(./.tbl[[1]]))
Error in mutate_impl(.data, dots) :
Evaluation error: object '.tbl' not found.
df %>%
mutate_at(vars(-1), funs(function(v) v/.[[1]]))
Error in mutate_impl(.data, dots) :
Column `val2` is of unsupported type function
One complication compared to Using functions of multiple columns in a dplyr mutate_at call is that my val1 column does not have a fixed name (i.e. it is not always called val1
), so I cannot use it by name in the funs arguments. The other complication is that the tibble is created on the fly (using lots of pipe operators) and is typically not stored in a variable, so I cannot simply divide by df[[1]]...
So, what is the proper dplyr approach to insert relative columns (i.e. percent of the first column) after each column?