dplyr & tibble - Create a new column according to values in a list column

Question

I have a tibble dataframe as follows:

data = tibble(X = c("a", "b", "c", "d","c"), 
          Y = c("a1", "b1", "c1", "d1","c1"), 
          Z = c("a2", "b2", "c2", "d2","c2"),
          all = list(c("a"), c("b"), c("c", "c1"), c("d", "d2"), c("c", "b2")))

I want to create a new column "result" such that for each row :
- if the "Y" value is in "all" => result equals the Y value
- if the "Z" value is in "all" => result equals the Z value
- else result equals "none"

I've tried the following code with the dplyr syntax.

data %>%
mutate(result = case_when(Y %in% all ~ Y,
                          Z %in% all ~ Z,
                          TRUE ~ "none"))

It doesn't work as expected and returns:

# A tibble: 4 x 5
  X     Y     Z     all       result
  <chr> <chr> <chr> <list>    <chr> 
1 a     a1    a2    <chr [2]> none  
2 b     b1    b2    <chr [1]> none     
3 c     c1    c2    <chr [2]> none  
4 d     d1    d2    <chr [2]> none

When I'd like to obtain :

# A tibble: 4 x 5
  X     Y     Z     all       result
  <chr> <chr> <chr> <list>    <chr> 
1 a     a1    a2    <chr [2]> none  
2 b     b1    b2    <chr [1]> none     
3 c     c1    c2    <chr [2]> c1  
4 d     d1    d2    <chr [2]> d2

EDIT

One problem is to unlist the values in the column list as stated by Ronak Shah. But even with this solution, the behavior is like working on the column list will consider all the values of the column and not only those of a row.

Here is what I obtain with the proposed solution and the edited data :

data %>%
    mutate(result = case_when(Y %in% flatten_chr(all) ~ Y,
                              Z %in% flatten_chr(all) ~ Z,
                              TRUE ~ "none")) 

# A tibble: 5 x 5
  X     Y     Z     all       result
  <chr> <chr> <chr> <list>    <chr> 
1 a     a1    a2    <chr [1]> none  
2 b     b1    b2    <chr [1]> b2    
3 c     c1    c2    <chr [2]> c1    
4 d     d1    d2    <chr [2]> d2    
5 c     c1    c2    <chr [2]> c1

When it should return :

# A tibble: 5 x 5
  X     Y     Z     all       result
  <chr> <chr> <chr> <list>    <chr> 
1 a     a1    a2    <chr [1]> none  
2 b     b1    b2    <chr [1]> none    
3 c     c1    c2    <chr [2]> c1    
4 d     d1    d2    <chr [2]> d2    
5 c     c1    c2    <chr [2]> none

data %>% rowwise() %>% mutate(result = case_when(Y %in% unlist(all) ~ Y, Z %in% unlist(all) ~ Z,TRUE ~ "none")) Should do what you want — A. Suliman

Ronak Shah Ronak Shah · Accepted Answer · 2019-03-21T09:02:17

all column actually is a list, you cannot directly compare the values in list column.

For example,

 "a" %in% list(c("a", "b"))
#[1] FALSE

You need unlist or flatten_chr to make it work.

"a" %in% unlist(list(c("a", "b")))
#[1] TRUE

"a" %in% flatten_chr(list(c("a", "b")))
#[1] TRUE

So, now you can do

library(tidyverse)

data %>%
  mutate(result = case_when(Y %in% flatten_chr(all) ~ Y,
                            Z %in% flatten_chr(all) ~ Z,
                            TRUE ~ "none"))


#  X     Y     Z     all      result
#  <chr> <chr> <chr> <list>    <chr> 
#1 a     a1    a2    <chr [2]> none  
#2 b     b1    b2    <chr [1]> none  
#3 c     c1    c2    <chr [2]> c1    
#4 d     d1    d2    <chr [2]> d2

EDIT

You could add rowwise to compare the values for each row.

data %>%
  rowwise() %>%
  mutate(result = case_when(Y %in% all ~ Y,
                            Z %in% all ~ Z,
                            TRUE ~ "none"))


# A tibble: 5 x 5
#  X     Y     Z     all       result
#  <chr> <chr> <chr> <list>    <chr> 
#1 a     a1    a2    <chr [1]> none  
#2 b     b1    b2    <chr [1]> none  
#3 c     c1    c2    <chr [2]> c1    
#4 d     d1    d2    <chr [2]> d2    
#5 c     c1    c2    <chr [2]> none

dplyr & tibble - Create a new column according to values in a list column

2 Answers