Suppose you want to calculate how far away a patient's BMI is away from the population median using a Z-score. This is calculated using the patient's BMI plus three age- and sex-dependent variables. These three variables are looked up in a table.
So, I created a function that takes age, sex, and BMI as inputs. It uses sex to find the appropriate table (male or female), age to find the appropriate row in that table, and then BMI in a calculation that incorporates the age- and sex-specific variables you just looked up. My function works when I enter the data manually into the function, but what I can't figure out is how to iteratively go through each row in the data-frame to apply my function, using other row-specific columns in the data frame as inputs.
For the sake of simplicity, I just use two age- and sex-dependent variables below (median BMI and then a multiplier)
### make master dataframe
study_id <- c(1001, 1002, 1003, 1004, 1005)
age <- c(4, 3, 3, 1, 5)
sex <- c(1, 1, 2, 2, 1)
df <- tibble(study_id, age_df, sex)
### reference male table
age_m <- c(1, 2, 3, 4, 5)
median_bmi_m <- c(14.9, 16.3, 16.9, 17.2, 17.3)
multiplier_m <- c(22, 23, 43, 11, 33)
reference_male <- tibble(age_m, median_bmi_m, multiplier_m)
### reference female table
age_f <- c(1, 2, 3, 4, 5)
median_bmi_f <- c(15.9, 17.3, 17.9, 18.2, 18.3)
multiplier_f <- c(12, 13, 33, 21, 23)
reference_female <- tibble(age_f, median_bmi_f, multiplier_f)
### my function
toy_function <- function(age, sex) {
if(sex == 1) {
a <- reference_male[age, 2]
b <- reference_male[age, 3]
c <- a*b
} else {
a <- reference_female[age, 2]
b <- reference_female[age, 3]
c <- a*b
}
return(as.numeric(c))
}
The function returns a numeric value "c" that I want to apply row-wise to each patient. I built a FOR loop that does it, but I'm thinking there is a more elegant way using purrr or apply() functions. I tried simply sticking the function inside of mutate, but I get an error.
df <- df %>%
mutate(new column = toy_function(age, sex)
Error in toy_function(age_df, sex) :
'list' object cannot be coerced to type 'double'
In addition: Warning message:
In if (sex == 1) { :
the condition has length > 1 and only the first element will be used
Thanks for you help. I still haven't quite gotten a good handle on purrr and other row-wise iteration strategies.
UPDATE
Thanks all for the answers. While the offered solution to the original toy example worked, when I went back to my original more complex function (with three inputs instead of two), I am getting an error message.
Suppose we update the function and original dataframe to incorporate BMI as so:
### updated dataframe with BMI variable
study_id <- c(1001, 1002, 1003, 1004, 1005)
age <- c(4, 3, 3, 1, 5)
sex <- c(1, 1, 2, 2, 1)
bmi <- c(15, 16, 17, 18, 19)
df <- tibble(study_id, age_df, sex, bmi)
### updated function with bmi variable incorporated into the equation
toy_function <- function(age, sex, bmi) {
if(sex == 1) {
a <- reference_male[age, 2]
b <- reference_male[age, 3]
c <- a*b*bmi
} else {
a <- reference_female[age, 2]
b <- reference_female[age, 3]
c <- a*b*bmi
}
return(as.numeric(c))
}
When I run the solution code as such, I get the following error:
df %>%
mutate(new_column = map2_dbl(age, sex, bmi, ~ toy_function(..1, ..2, ..3)))
Result 1 must be a single double, not NULL of length 0
It appears that there is something that I'm doing incorrectly as I add in the third variable. Note: I read that the ..1, ..2, ..3 syntax may be preferred when you have multiple variables in the function, but I may be mistaken.