0
votes

Does anyone know why dplyr::case_when() produces the error in the following code?

tibble(tmp1 = sample(c(T, F), size = 32, replace = T),
       tmp2 = sample(c(T, F), size = 32, replace = T),
       tmp3 = sample(c(T, F), size = 32, replace = T)) %>%
  mutate(tmp = apply(cbind(tmp1, tmp2, tmp3), 1, function(x) {
    case_when(
      all(x == F) ~ "N",
      any(x == T) ~ "Y"
    )
  }))

Error in mutate_impl(.data, dots) : 
  Evaluation error: object 'x' not found.

I am using R 3.4.3 with dplyr 0.7.4 on Ubuntu 16.04.

The error message is quite confusing, since the following code works fine, which indicates that x is not missing:

tibble(tmp1 = sample(c(T, F), size = 32, replace = T),
       tmp2 = sample(c(T, F), size = 32, replace = T),
       tmp3 = sample(c(T, F), size = 32, replace = T)) %>%
  mutate(tmp = apply(cbind(tmp1, tmp2, tmp3), 1, function(x) {
    if (all(x == F)) {
      "N"
    } else if(any(x == T)) {
      "Y"
    }
  }))

Just for reference, the following code also works fine:

cbind(tmp1 = sample(c(T, F), size = 32, replace = T),
      tmp2 = sample(c(T, F), size = 32, replace = T),
      tmp3 = sample(c(T, F), size = 32, replace = T)) %>%
  apply(1, function(x) {
    case_when(
      all(x == F) ~ "N",
      any(x == T) ~ "Y"
    )
  })
3
@Maurits Evers and @www provided some really good alternatives. But the actual use case is more complicated than the example (more variables, more rules, handling of NA values), so a more flexible form is preferred (e.g., if-else or case_when). Do you know why the case_when() (which is more convenient than if-else) code produces the error? - NickZeng
Didn't @www give you the answer as to why case_when doesn't work? ("The issue is case_when does not do row-wise operation.") BTW, I don't get an error, but all the entries are NA. - Maurits Evers
@MauritsEvers Well, row-wise operation is possible if you put case_when in apply, see the updated example in the question. I think @www did not mention why "object x" couldn't be found. I am using Ubuntu 16.04, what system are you using, @MauritsEvers? Is it a platform specific problem? - NickZeng
@NickZeng If the posts MauritsEvers or I provided generate the desired outputs you to your example, please consider to accept one of the posts as the answer. If your actual use case is more complicated than the example you provided here, please consider ask a new question with examples that can actually represent your real-world data. - www
And just to reiterate: There's no error on my side when I run the first code chunk. However, all tmp entries are NA. - Maurits Evers

3 Answers

0
votes

The issue is case_when does not do row-wise operation. However, we can simplify the code by using rowSums (which conducts row-wise operation) and case_when.

library(dplyr)

set.seed(151)

tibble(tmp1 = sample(c(T, F), size = 32, replace = T),
       tmp2 = sample(c(T, F), size = 32, replace = T),
       tmp3 = sample(c(T, F), size = 32, replace = T)) %>%
  mutate(tmp = case_when(
      rowSums(.) == 0   ~"N",
      rowSums(.) > 0    ~"Y" 
    ))

# # A tibble: 32 x 4
#   tmp1  tmp2  tmp3  tmp  
#   <lgl> <lgl> <lgl> <chr>
#  1 TRUE  TRUE  FALSE Y    
#  2 FALSE FALSE TRUE  Y    
#  3 FALSE FALSE TRUE  Y    
#  4 FALSE FALSE TRUE  Y    
#  5 TRUE  FALSE FALSE Y    
#  6 FALSE FALSE FALSE N    
#  7 TRUE  FALSE FALSE Y    
#  8 FALSE TRUE  FALSE Y    
#  9 TRUE  TRUE  FALSE Y    
# 10 FALSE FALSE TRUE  Y    
# # ... with 22 more rows

Or since there are only two conditions, rowSums with ifelse should be fine.

set.seed(151)

tibble(tmp1 = sample(c(T, F), size = 32, replace = T),
       tmp2 = sample(c(T, F), size = 32, replace = T),
       tmp3 = sample(c(T, F), size = 32, replace = T)) %>%
  mutate(tmp = ifelse(rowSums(.) == 0, "N", "Y"))
# # A tibble: 32 x 4
#   tmp1  tmp2  tmp3  tmp  
#   <lgl> <lgl> <lgl> <chr>
#  1 TRUE  TRUE  FALSE Y    
#  2 FALSE FALSE TRUE  Y    
#  3 FALSE FALSE TRUE  Y    
#  4 FALSE FALSE TRUE  Y    
#  5 TRUE  FALSE FALSE Y    
#  6 FALSE FALSE FALSE N    
#  7 TRUE  FALSE FALSE Y    
#  8 FALSE TRUE  FALSE Y    
#  9 TRUE  TRUE  FALSE Y    
# 10 FALSE FALSE TRUE  Y    
# # ... with 22 more rows
0
votes

How about using Reduce and logical OR?

set.seed(151);
tibble(tmp1 = sample(c(T, F), size = 32, replace = T),
       tmp2 = sample(c(T, F), size = 32, replace = T),
       tmp3 = sample(c(T, F), size = 32, replace = T)) %>%
    mutate(tmp = Reduce(`|`, list(tmp1, tmp2, tmp3)))
## A tibble: 32 x 4
#   tmp1  tmp2  tmp3  tmp
#   <lgl> <lgl> <lgl> <lgl>
# 1 TRUE  TRUE  FALSE TRUE
# 2 FALSE FALSE TRUE  TRUE
# 3 FALSE FALSE TRUE  TRUE
# 4 FALSE FALSE TRUE  TRUE
# 5 TRUE  FALSE FALSE TRUE
# 6 FALSE FALSE FALSE FALSE
# 7 TRUE  FALSE FALSE TRUE
# 8 FALSE TRUE  FALSE TRUE
# 9 TRUE  TRUE  FALSE TRUE
#10 FALSE FALSE TRUE  TRUE
## ... with 22 more rows
0
votes

As it turns out, this is a bug, probably related to the hybrid evaluator: https://github.com/tidyverse/dplyr/issues/3422