0
votes

I'm trying to create a function that will compare variables 1 and 2 and create a third variable based on whether they match. I need to do this >25 times (for different combinations of variables), which is why I want to create a function instead of just using mutate and case_when.

I'm pretty new to R, so this is mostly cobbled together from other helpful stack overflow posts and miscellaneous tutorials.

Here's what I tried:

determine_match <- function(df, col_a, col_b){


col_a <- enquo(col_a)
  col_b <- enquo(col_b)
  newvar <- paste0(quo_name(col_a), quo_name(col_b))
  df <- df %>% mutate(!!newvar:= case_when(
    !!col_a == '1' & !!col_b =='Yes' ~ 'Match',
    !!col_a == '0' & !! col_b == 'No' ~ 'Match',
    !!col_a == '1' & !!col_b == 'No' ~ 'No Match',
    !!col_a == '0' & !!col_b == 'Yes' ~ 'No Match',
    is.na(!!col_a) | is.na(!!col_b) ~ NA_character_,
    TRUE ~ 'Error'
  )) 
}

And I tested it on this data set:

test1 <- c('1', '0', '1', '1', '0', NA)
test2 <- c('Yes', 'No', 'No,', NA, 'Yes', NA)
id <- c(1,2,3,4,5,6)
testing.df <- data.frame(id, test1, test2)

I'm not getting errors, but when I run the function with a print statement, it only returns the string name for newvar and doesn't change the actual data frame.

I also tried testing.df %>% mutate(testing3 = funs(determine_match(testing.df, testing1, testing2))) and testing3 gives me ~determine_match(testing.df, testing1, testing2)

Not sure if the problem is the function, the attempt to apply, or both.

Hope some kind soul can help, thank you!!

1
Please show how you're running the function. Are you doing result.df <- determine_match(testing.df, test1, test2) and result.df isn't what you expect? - Gregor Thomas
I think you just need return(df) at the end of your function... though you could probably simplify the code - Gregor Thomas

1 Answers

0
votes

You need to return the result, add return(df) (or even just df) as the last line of your function.

If you're not worried about input values other than the ones you explicitly mention, ("0", "1", NA for col_a, and "Yes", "No", NA for col_b), you could simplify the condition to this (for some definitions of "simplify"---it's definitely shorter).

determine_match <- function(df, col_a, col_b) {
  col_a <- enquo(col_a)
  col_b <- enquo(col_b)
  newvar <- paste0(quo_name(col_a), quo_name(col_b))
  df <- df %>% mutate(
    !!newvar := 
      c("No Match", "Match")[((!!col_a == '1') == (!!col_b == 'Yes')) + 1]
    )
  return(df)
}