I would like to use dplyr's mutate_if()
function to convert list-columns to data-frame-columns, but run into a puzzling error when I try to do so. I am using dplyr 0.5.0, purrr 0.2.2, R 3.3.0.
The basic setup looks like this: I have a data frame d
, some of whose columns are lists:
d <- dplyr::data_frame(
A = list(
list(list(x = "a", y = 1), list(x = "b", y = 2)),
list(list(x = "c", y = 3), list(x = "d", y = 4))
),
B = LETTERS[1:2]
)
I would like to convert the column of lists (in this case, d$A
) to a column of data frames using the following function:
tblfy <- function(x) {
x %>%
purrr::transpose() %>%
purrr::simplify_all() %>%
dplyr::as_data_frame()
}
That is, I would like the list-column d$A
to be replaced by the list lapply(d$A, tblfy)
, which is
[[1]]
# A tibble: 2 x 2
x y
<chr> <dbl>
1 a 1
2 b 2
[[2]]
# A tibble: 2 x 2
x y
<chr> <dbl>
1 c 3
2 d 4
Of course, in this simple case, I could just do a simple reassignment. The point, however, is that I would like to do this programmatically, ideally with dplyr, in a generally applicable way that could deal with any number of list-columns.
Here's where I stumble: When I try to convert the list-columns to data-frame-columns using the following application
d %>% dplyr::mutate_if(is.list, funs(tblfy))
I get an error message that I don't know how to interpret:
Error: Each variable must be named.
Problem variables: 1, 2
Why does mutate_if()
fail? How can I properly apply it to get the desired result?
Remark
A commenter has pointed out that the function tblfy()
should be vectorized. That is a reasonable suggestion. But — unless I have vectorized incorrectly — that does not seem to get at the root of the problem. Plugging in a vectorized version of tblfy()
,
tblfy_vec <- Vectorize(tblfy)
into mutate_if()
fails with the error
Error: wrong result size (4), expected 2 or 1
Update
After gaining some experience with purrr, I now find the following approach natural, if somewhat long-winded:
d %>%
map_if(is.list, ~ map(., ~ map_df(., identity))) %>%
as_data_frame()
This is more or less identical to @alistaire's solution, below, but uses map_if()
, resp. map()
, in place of mutate_if()
, resp. Vectorize()
.
tblfy(d$A)
. There's an error because there are two lists ind$A
. You are not comparing apples to apples. In yourlapply(d$A, tblfy)
you are giving your function one list at a time, that's why it works.tblfy(d$A[[1]])
andtblfy(d$A[[2]])
. In your dplyr function you are supplying two lists. Changetblfy
to accept more than one list, or change the dplyr call. Or as MrFlick asks, think more broadly about what you're building. – Pierre Ltblfy_vec()
directly tod$A
, I get a list of 4, which doesn't jive at all with my understanding that vectorizing creates a function that operates on a list (or vector) component-wise. – egnha