2
votes

I have following problem. I have list column of tibbles, which I want to clean up by just taking certain values and creating new tibble from them. Problem is that not all columns are available in original tibble, so I would like to replace them with NA. Example will clear this one.

DATA:

tbl <- tibble(list_col = list(tibble(id = c("A", "B", "C"),
                                    val1 = c(1, 2, 3),
                                    val2 = c(11, 22, 33)),
                             tibble(id = c("A", "B", "C"),
                                    val1 = c(1, 2, 3))))

Real data is more complicated, but this will suffice to show problem. I have created function to pick columns I would like to have. I wrap everything now days with possibly to avoid falling short.

get_res <- possibly(function(ls){
  out = tibble(id = ls$id,
               val1 = ls$val1,
               val2 = ls$val2)
  return(out)
}, otherwise = NA)

Then I usually map function with mutate to create clean column.

tbl %>% 
  mutate(clean = map(list_col, get_res))

# A tibble: 2 x 2
  list_col         clean           
  <list>           <list>          
1 <tibble [3 x 3]> <tibble [3 x 3]>
2 <tibble [3 x 2]> <lgl [1]>       
Warning message:
Unknown or uninitialised column: 'val2'. 

Expected result, but how can I go around this and replace val2 with NA in second tibble and return values what I have there?

I thought this would have worked:

get_res2 <- possibly(function(ls){
  out = tibble(id = ls$id,
               val1 = ls$val1,
               val2 = tryCatch(ls$val2, error=function(e) NA))
  return(out)
}, otherwise = NA)

But this will fail as well.

1
Since your function explicitly uses id, val1, val2, can we assume that your actual case also has only few columns with known names? So that we don't need to look for all possible column names in tbl first.Julius Vainora
I know all the column names I want to pull, so no worries there.Hakki

1 Answers

0
votes

Here's a nice trick for this:

tbl %>% 
  mutate(clean = map(list_col, bind_rows,
                     tibble(id = character(), val1 = double(), val2 = double())))
# A tibble: 2 x 2
#   list_col         clean           
#   <list>           <list>          
# 1 <tibble [3 × 3]> <tibble [3 × 3]>
# 2 <tibble [3 × 2]> <tibble [3 × 3]>

Exploiting the fact that you know in advance what the columns are + creating an empty tibble + bind_rows to fill the empty ones with NA.