3
votes

Using dplyr, I want to divide a column by another one, where the two columns have a similar pattern. I have the following data frame:

My_data = data.frame(
  var_a = 101:110,
  var_b = 201:210,
  number_a = 1:10,
  number_b = 21:30)

I would like to create a new variable: var_a_new = var_a/number_a, var_b_new = var_b/number_b and so on if I have c, d etc.

My_data %>%
  mutate_at(
    .vars = c('var_a', 'var_b'),
    .funs = list( new = function(x) x/(.[,paste0('number_a', names(x))]) ))

I did not get an error, but a wrong result. I think that the problem is that I don't understand what the 'x' is. Is it one of the string in .vars? Is it a column in My_data? Something else?

3

3 Answers

4
votes

One option could be:

bind_cols(My_data,
          My_data %>%
           transmute(across(starts_with("var"))/across(starts_with("number"))) %>%
           rename_all(~ paste0(., "_new")))

   var_a var_b number_a number_b var_a_new var_b_new
1    101   201        1       21 101.00000  9.571429
2    102   202        2       22  51.00000  9.181818
3    103   203        3       23  34.33333  8.826087
4    104   204        4       24  26.00000  8.500000
5    105   205        5       25  21.00000  8.200000
6    106   206        6       26  17.66667  7.923077
7    107   207        7       27  15.28571  7.666667
8    108   208        8       28  13.50000  7.428571
9    109   209        9       29  12.11111  7.206897
10   110   210       10       30  11.00000  7.000000
0
votes

You can do this directly provided the columns are correctly ordered meaning "var_a" is first column in "var" group and "number_a" is first column in "number" group and so on for other pairs.

var_cols <- grep('var', names(My_data), value = TRUE)
number_cols <- grep('number', names(My_data), value = TRUE)

My_data[paste0(var_cols, '_new')] <- My_data[var_cols]/My_data[number_cols]
My_data

#   var_a var_b number_a number_b var_a_new var_b_new
#1    101   201        1       21 101.00000  9.571429
#2    102   202        2       22  51.00000  9.181818
#3    103   203        3       23  34.33333  8.826087
#4    104   204        4       24  26.00000  8.500000
#5    105   205        5       25  21.00000  8.200000
#6    106   206        6       26  17.66667  7.923077
#7    107   207        7       27  15.28571  7.666667
#8    108   208        8       28  13.50000  7.428571
#9    109   209        9       29  12.11111  7.206897
#10   110   210       10       30  11.00000  7.000000
0
votes

The function across() has replaced scope variants such as mutate_at(), summarize_at() and others. For more details, see vignette("colwise") or https://cran.r-project.org/web/packages/dplyr/vignettes/colwise.html. Based on tmfmnk's answer, the following works well:

My_data %>% 
  mutate(
    new = across(starts_with("var"))/across(starts_with("number")))

The prefix "new." will be added to the names of the new variables.

  var_a var_b number_a number_b new.var_a     new.var_b
1    101   201        1       21 101.00000      9.571429
2    102   202        2       22  51.00000      9.181818
3    103   203        3       23  34.33333      8.826087
4    104   204        4       24  26.00000      8.500000
5    105   205        5       25  21.00000      8.200000
6    106   206        6       26  17.66667  7.923077
7    107   207        7       27  15.28571  7.666667
8    108   208        8       28  13.50000  7.428571
9    109   209        9       29  12.11111  7.206897
10   110   210       10       30  11.00000  7.000000