dplyr rowwise mutate with custom function returns unexpected NA

Question

I have a data set with two columns that I would like to combine into a single column using dplyr rowwise mutate and a custom function. Strangely for the second row matching a certain pattern (but not the first or subsequent) I get NA as a return value. Below is an example:

my.func <- function(alpha, beta) {
  if(!is.na(beta) & beta) {
    return("c")
  } else if(is.na(alpha)) {
    return(as.character(NA))
  } else if (alpha == "a") {
    return("a")
  } else if (alpha == "b") {
    return("b")
  } else {
    return(as.character(NA))
  }
}

tmp <- data.frame(obs = 1:7,
                  dt = c('2016-03-15 17:35:46','2016-03-15 18:45:47','2016-03-15 19:22:17','2016-03-15 19:23:45','2016-03-15 20:21:55','2016-03-15 21:20:10','2016-03-15 22:18:34'),
                  one = c(NA,"a","a","a","b","a","b"), two = c(NA,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE))

tmp2 <- tmp %>% rowwise() %>% mutate(three = my.func(one, two))

This results in an NA in row three, column three, when for the row above, with the exact same input, it resulted in "a".

Do you meant the third printed row or the row with a rowname of "3"? The row with a rowname of "1" is the only NA value when i run it and that entirely expected. (And I think you are referring to column 5.) Time for a version check? — IRTFM
I am really puzzeld about this. mapply(my.func, tmp$one, tmp$two) provides the correct result. And if you change the first NA it becomes right as well... Maybe it has something to do with rowwise(). — Alex
This reported issue seems to fit the anomolous results: github.com/hadley/dplyr/issues/1908 — IRTFM
@42- To clarify I meant the third row and the fifth column, which is called "three". Sorry for the confusion. I have dplyr version 0.4.3. I will upgrade. Thank you for your help — jarkub

thepule thepule · Accepted Answer · 2016-06-20T23:07:52

I don't understand exactly why your code does not work, but the following seems to do what you expect:

tmp2 <- tmp %>% mutate(three = mapply(my.func, one, two))

>tmp2

  obs                  dt  one   two three
1   1 2016-03-15 17:35:46 <NA>    NA  <NA>
2   2 2016-03-15 18:45:47    a FALSE     a
3   3 2016-03-15 19:22:17    a FALSE     a
4   4 2016-03-15 19:23:45    a FALSE     a
5   5 2016-03-15 20:21:55    b FALSE     b
6   6 2016-03-15 21:20:10    a  TRUE     c
7   7 2016-03-15 22:18:34    b FALSE     b

dplyr rowwise mutate with custom function returns unexpected NA

1 Answers