1
votes

I have a tibble dataframe as follows:

data = tibble(X = c("a", "b", "c", "d","c"), 
          Y = c("a1", "b1", "c1", "d1","c1"), 
          Z = c("a2", "b2", "c2", "d2","c2"),
          all = list(c("a"), c("b"), c("c", "c1"), c("d", "d2"), c("c", "b2")))

I want to create a new column "result" such that for each row :
- if the "Y" value is in "all" => result equals the Y value
- if the "Z" value is in "all" => result equals the Z value
- else result equals "none"

I've tried the following code with the dplyr syntax.

data %>%
mutate(result = case_when(Y %in% all ~ Y,
                          Z %in% all ~ Z,
                          TRUE ~ "none"))

It doesn't work as expected and returns:

# A tibble: 4 x 5
  X     Y     Z     all       result
  <chr> <chr> <chr> <list>    <chr> 
1 a     a1    a2    <chr [2]> none  
2 b     b1    b2    <chr [1]> none     
3 c     c1    c2    <chr [2]> none  
4 d     d1    d2    <chr [2]> none 

When I'd like to obtain :

# A tibble: 4 x 5
  X     Y     Z     all       result
  <chr> <chr> <chr> <list>    <chr> 
1 a     a1    a2    <chr [2]> none  
2 b     b1    b2    <chr [1]> none     
3 c     c1    c2    <chr [2]> c1  
4 d     d1    d2    <chr [2]> d2  

EDIT

One problem is to unlist the values in the column list as stated by Ronak Shah. But even with this solution, the behavior is like working on the column list will consider all the values of the column and not only those of a row.

Here is what I obtain with the proposed solution and the edited data :

data %>%
    mutate(result = case_when(Y %in% flatten_chr(all) ~ Y,
                              Z %in% flatten_chr(all) ~ Z,
                              TRUE ~ "none")) 

# A tibble: 5 x 5
  X     Y     Z     all       result
  <chr> <chr> <chr> <list>    <chr> 
1 a     a1    a2    <chr [1]> none  
2 b     b1    b2    <chr [1]> b2    
3 c     c1    c2    <chr [2]> c1    
4 d     d1    d2    <chr [2]> d2    
5 c     c1    c2    <chr [2]> c1 

When it should return :

# A tibble: 5 x 5
  X     Y     Z     all       result
  <chr> <chr> <chr> <list>    <chr> 
1 a     a1    a2    <chr [1]> none  
2 b     b1    b2    <chr [1]> none    
3 c     c1    c2    <chr [2]> c1    
4 d     d1    d2    <chr [2]> d2    
5 c     c1    c2    <chr [2]> none 
2
data %>% rowwise() %>% mutate(result = case_when(Y %in% unlist(all) ~ Y, Z %in% unlist(all) ~ Z,TRUE ~ "none")) Should do what you wantA. Suliman

2 Answers

4
votes

all column actually is a list, you cannot directly compare the values in list column.

For example,

 "a" %in% list(c("a", "b"))
#[1] FALSE

You need unlist or flatten_chr to make it work.

"a" %in% unlist(list(c("a", "b")))
#[1] TRUE

"a" %in% flatten_chr(list(c("a", "b")))
#[1] TRUE

So, now you can do

library(tidyverse)

data %>%
  mutate(result = case_when(Y %in% flatten_chr(all) ~ Y,
                            Z %in% flatten_chr(all) ~ Z,
                            TRUE ~ "none"))


#  X     Y     Z     all      result
#  <chr> <chr> <chr> <list>    <chr> 
#1 a     a1    a2    <chr [2]> none  
#2 b     b1    b2    <chr [1]> none  
#3 c     c1    c2    <chr [2]> c1    
#4 d     d1    d2    <chr [2]> d2  

EDIT

You could add rowwise to compare the values for each row.

data %>%
  rowwise() %>%
  mutate(result = case_when(Y %in% all ~ Y,
                            Z %in% all ~ Z,
                            TRUE ~ "none"))


# A tibble: 5 x 5
#  X     Y     Z     all       result
#  <chr> <chr> <chr> <list>    <chr> 
#1 a     a1    a2    <chr [1]> none  
#2 b     b1    b2    <chr [1]> none  
#3 c     c1    c2    <chr [2]> c1    
#4 d     d1    d2    <chr [2]> d2    
#5 c     c1    c2    <chr [2]> none  
0
votes

I have used ifelse and mapply and any to go through the list and generating the values based on the given conditions as shown below

data$result <- ifelse(unlist(mapply(
  data$all,data$Z,
  FUN =  function(x, y) {
    any(x%in% y)
  }
)), data$Z, ifelse(unlist(mapply(
  data$all, data$Y,
  FUN =  function(x, y) {
    any(x%in% y)
  }
)), data$Y, "none"))

#View Data
data

alternatively:


library(dplyr)

data %>%
  mutate(result = ifelse(unlist(mapply(
    all,Z,
    FUN =  function(x,y) {
      any(x%in% y)
    }
  )), Z, ifelse(unlist(mapply(
    all,Y,
    FUN =  function(x, y) {
      any(x%in% y)
    }
  )), Y, "none")))

This produces

enter image description here