1
votes

Suppose you want to calculate how far away a patient's BMI is away from the population median using a Z-score. This is calculated using the patient's BMI plus three age- and sex-dependent variables. These three variables are looked up in a table.

So, I created a function that takes age, sex, and BMI as inputs. It uses sex to find the appropriate table (male or female), age to find the appropriate row in that table, and then BMI in a calculation that incorporates the age- and sex-specific variables you just looked up. My function works when I enter the data manually into the function, but what I can't figure out is how to iteratively go through each row in the data-frame to apply my function, using other row-specific columns in the data frame as inputs.

For the sake of simplicity, I just use two age- and sex-dependent variables below (median BMI and then a multiplier)

### make master dataframe
study_id <- c(1001, 1002, 1003, 1004, 1005)
age <- c(4, 3, 3, 1, 5)
sex <- c(1, 1, 2, 2, 1)
df <- tibble(study_id, age_df, sex)

### reference male table
age_m <- c(1, 2, 3, 4, 5)
median_bmi_m <- c(14.9, 16.3, 16.9, 17.2, 17.3)
multiplier_m <- c(22, 23, 43, 11, 33)
reference_male <- tibble(age_m, median_bmi_m, multiplier_m)

### reference female table
age_f <- c(1, 2, 3, 4, 5)
median_bmi_f <- c(15.9, 17.3, 17.9, 18.2, 18.3)
multiplier_f <- c(12, 13, 33, 21, 23)
reference_female <- tibble(age_f, median_bmi_f, multiplier_f)

### my function
toy_function <- function(age, sex) {
  if(sex == 1) {
    a <- reference_male[age, 2]
    b <- reference_male[age, 3]
    c <- a*b
  } else {
    a <- reference_female[age, 2]
    b <- reference_female[age, 3]
    c <- a*b
  }
  return(as.numeric(c))
}

The function returns a numeric value "c" that I want to apply row-wise to each patient. I built a FOR loop that does it, but I'm thinking there is a more elegant way using purrr or apply() functions. I tried simply sticking the function inside of mutate, but I get an error.

df <- df %>%
   mutate(new column = toy_function(age, sex)

Error in toy_function(age_df, sex) : 
  'list' object cannot be coerced to type 'double'
In addition: Warning message:
In if (sex == 1) { :
  the condition has length > 1 and only the first element will be used

Thanks for you help. I still haven't quite gotten a good handle on purrr and other row-wise iteration strategies.

UPDATE

Thanks all for the answers. While the offered solution to the original toy example worked, when I went back to my original more complex function (with three inputs instead of two), I am getting an error message.

Suppose we update the function and original dataframe to incorporate BMI as so:

### updated dataframe with BMI variable
study_id <- c(1001, 1002, 1003, 1004, 1005)
age <- c(4, 3, 3, 1, 5)
sex <- c(1, 1, 2, 2, 1)
bmi <- c(15, 16, 17, 18, 19)
df <- tibble(study_id, age_df, sex, bmi)

### updated function with bmi variable incorporated into the equation

toy_function <- function(age, sex, bmi) {
  if(sex == 1) {
    a <- reference_male[age, 2]
    b <- reference_male[age, 3]
    c <- a*b*bmi
  } else {
    a <- reference_female[age, 2]
    b <- reference_female[age, 3]
    c <- a*b*bmi
  }
  return(as.numeric(c))
}

When I run the solution code as such, I get the following error:

df %>%
  mutate(new_column = map2_dbl(age, sex, bmi, ~ toy_function(..1, ..2, ..3)))

Result 1 must be a single double, not NULL of length 0

It appears that there is something that I'm doing incorrectly as I add in the third variable. Note: I read that the ..1, ..2, ..3 syntax may be preferred when you have multiple variables in the function, but I may be mistaken.

2

2 Answers

3
votes

As the function was constructed with if/else which are not vectorized, we could convert the function to Vectorized one and apply

library(dplyr)
df %>%
     mutate(new_column = Vectorize(toy_function)(age, sex))

-ouptut

# A tibble: 5 x 4
  study_id   age   sex new_column
     <dbl> <dbl> <dbl>      <dbl>
1     1001     4     1       189.
2     1002     3     1       727.
3     1003     3     2       591.
4     1004     1     2       191.
5     1005     5     1       571.
2
votes

We have to use rowwise function before mutate here:

library(dplyr)

df %>%
  rowwise() %>%
  mutate(new_column = toy_function(age, sex))

# A tibble: 5 x 4
# Rowwise: 
  study_id   age   sex new_column
     <dbl> <dbl> <dbl>      <dbl>
1     1001     4     1       189.
2     1002     3     1       727.
3     1003     3     2       591.
4     1004     1     2       191.
5     1005     5     1       571.

Or if you would like to do it with purrr you can use the following code. Here since this is a row-wise operation .x value refer to the corresponding value of variable age in every row and .y refers to the corresponding value of variable sex in every row:

library(purrr)

df %>%
  mutate(new_column = map2_dbl(age, sex, ~ toy_function(.x, .y)))

# A tibble: 5 x 4
  study_id   age   sex new_column
     <dbl> <dbl> <dbl>      <dbl>
1     1001     4     1       189.
2     1002     3     1       727.
3     1003     3     2       591.
4     1004     1     2       191.
5     1005     5     1       571.

Or in base R:

cbind(mapply(\(x, y) toy_function(x, y), df$age, df$sex) |>
        as.data.frame() |>
        setNames("new_column"), df)

Updated Solution It should be noted that since we are iterating over more that 2 variables here, we require to use pmap instead of map2.

df %>%
  mutate(new_column = pmap_dbl(., ~ toy_function(..2, ..3, ..4)))

# A tibble: 5 x 5
  study_id   age   sex   bmi new_column
     <dbl> <dbl> <dbl> <dbl>      <dbl>
1     1001     4     1    15      2838 
2     1002     3     1    16     11627.
3     1003     3     2    17     10042.
4     1004     1     2    18      3434.
5     1005     5     1    19     10847.

Or if you wanna stick by your solution just exclude the first variable in .l argument of pmap:

df %>%
  mutate(new_column = pmap_dbl(df[-1], ~ toy_function(..1, ..2, ..3)))

And with pmap we don't need rowwise to emphasis a row-wise operation as specified in the documentations:

Note that a data frame is a very important special case, in which case pmap() and pwalk() apply the function .f to each row.