6
votes

I would like to use dplyr's mutate_if() function to convert list-columns to data-frame-columns, but run into a puzzling error when I try to do so. I am using dplyr 0.5.0, purrr 0.2.2, R 3.3.0.

The basic setup looks like this: I have a data frame d, some of whose columns are lists:

d <- dplyr::data_frame(
  A = list(
    list(list(x = "a", y = 1), list(x = "b", y = 2)),
    list(list(x = "c", y = 3), list(x = "d", y = 4))
  ),
  B = LETTERS[1:2]
)

I would like to convert the column of lists (in this case, d$A) to a column of data frames using the following function:

tblfy <- function(x) {
  x %>%
    purrr::transpose() %>%
    purrr::simplify_all() %>%
    dplyr::as_data_frame()
}

That is, I would like the list-column d$A to be replaced by the list lapply(d$A, tblfy), which is

[[1]]
#  A tibble: 2 x 2
      x     y
  <chr> <dbl>
1     a     1
2     b     2

[[2]]
# A tibble: 2 x 2
      x     y
  <chr> <dbl>
1     c     3
2     d     4

Of course, in this simple case, I could just do a simple reassignment. The point, however, is that I would like to do this programmatically, ideally with dplyr, in a generally applicable way that could deal with any number of list-columns.

Here's where I stumble: When I try to convert the list-columns to data-frame-columns using the following application

d %>% dplyr::mutate_if(is.list, funs(tblfy))

I get an error message that I don't know how to interpret:

Error: Each variable must be named.
Problem variables: 1, 2

Why does mutate_if() fail? How can I properly apply it to get the desired result?

Remark

A commenter has pointed out that the function tblfy() should be vectorized. That is a reasonable suggestion. But — unless I have vectorized incorrectly — that does not seem to get at the root of the problem. Plugging in a vectorized version of tblfy(),

tblfy_vec <- Vectorize(tblfy)

into mutate_if() fails with the error

Error: wrong result size (4), expected 2 or 1

Update

After gaining some experience with purrr, I now find the following approach natural, if somewhat long-winded:

d %>%
  map_if(is.list, ~ map(., ~ map_df(., identity))) %>%
  as_data_frame()

This is more or less identical to @alistaire's solution, below, but uses map_if(), resp. map(), in place of mutate_if(), resp. Vectorize().

2
So what exactly is the expected output? You want to change A from a list of lists to a list of tibbles?MrFlick
Your function is not vectorized, it only accepts one list. Look at tblfy(d$A). There's an error because there are two lists in d$A. You are not comparing apples to apples. In your lapply(d$A, tblfy) you are giving your function one list at a time, that's why it works. tblfy(d$A[[1]]) and tblfy(d$A[[2]]). In your dplyr function you are supplying two lists. Change tblfy to accept more than one list, or change the dplyr call. Or as MrFlick asks, think more broadly about what you're building.Pierre L
@MrFlick I have edited the question to make the desired output explicit. Is it clear now?egnha
@PierreLafortune Good point. I had already tried vectorizing but it still failed. See the edited question. Presumably I am vectorizing incorrectly. But how? Oddly, when I apply tblfy_vec() directly to d$A, I get a list of 4, which doesn't jive at all with my understanding that vectorizing creates a function that operates on a list (or vector) component-wise.egnha
Try to insert an apply function. Either Map or lapplyPierre L

2 Answers

7
votes

The original tblfy function errors out for me (even when its elements are chained directly), so let's rebuild it a bit, adding vectorization as well, which lets us avoid an otherwise-necessary prior rowwise() call:

tblfy <- Vectorize(function(x){x %>% purrr::map_df(identity) %>% list()})

Now we can use mutate_if nicely:

d %>% mutate_if(purrr::is_list, tblfy)
## Source: local data frame [2 x 2]
## 
##                A     B
##           <list> <chr>
## 1 <tbl_df [2,2]>     A
## 2 <tbl_df [2,2]>     B

...and if we unnest to see what's there,

d %>% mutate_if(purrr::is_list, tblfy) %>% tidyr::unnest()
## Source: local data frame [4 x 3]
## 
##       B     x     y
##   <chr> <chr> <dbl>
## 1     A     a     1
## 2     A     b     2
## 3     B     c     3
## 4     B     d     4

A couple notes:

  • map_df(identity) seems to be more efficient at building a tibble than any of the alternative formulations. I know the identity call seems unnecessary, but most everything else breaks.
  • I'm not sure how widely useful tblfy will be, as it's somewhat dependent on the structure of the lists in the list column, which can vary enormously. If you have a lot with a similar structure, I suppose it's useful, though.
  • There may be a way to do this with pmap instead of Vectorize, but I can't get it to work with some cursory tries.
7
votes

In-place conversion without any copying:

library(data.table)

for (col in d) if (is.list(col)) lapply(col, setDF)

d
#Source: local data frame [2 x 2]
#
#                A B
#1 <S3:data.frame> A
#2 <S3:data.frame> B