1
votes

I want to be able to apply operations to a data frame (tibble) column that contains S3 list-like objects, to act on one of the named items from each object in the column. As per the bottom of the question, I have this working using sapply() within mutate(), but that seems like it should be unnecessary.

Where information is stored in columns containing atomic data, dplyr functions like mutate() work as expected. This works, for example:

library(dplyr)
people_cols <- tibble(name = c("Fiona Foo", "Barry Bar", "Basil Baz"),
                  height_mm = c(1750, 1700, 1800),
                  weight_kg = c(75, 73, 74)) %>%
  mutate(height_inch = height_mm / 25.4)
people_cols
# # A tibble: 3 × 4
#   name          height_mm   weight_kg   height_inch
#   <chr>         <dbl>       <dbl>       <dbl>
# 1 Fiona Foo     1750        75          68.89764
# 2 Barry Bar     1700        73          66.92913
# 3 Basil Baz     1800        74          70.86614

But I want to work with data in S3 list objects. Here is a toy example:

person_stats <- function(name, height_mm, weight_kg) {
  this_person <- structure(list(name = name,
                                height_mm = height_mm,
                                weight_kg = weight_kg),
                           class = "person_stats")
}

fiona <- person_stats("Fiona Foo", 1750, 75)
barry <- person_stats("Barry Bar", 1700, 73)
basil <- person_stats("Basil Baz", 1800, 74)

fiona$height_mm
# [1] 1750

I can put these objects into a tibble column like this:

people <- tibble(personstat = list(fiona, barry, basil))

people
# # A tibble: 3 × 1
# personstat
#     <list>
#   1 <S3: person_stats>
#   2 <S3: person_stats>
#   3 <S3: person_stats>

But if I try to use mutate() on the column containing those objects I get errors:

people <- tibble(personstat = list(fiona, barry, basil)) %>%
  mutate(height_inch = personstat$height_mm / 25.4)
# Error in mutate_impl(.data, dots) : object 'personstat' not found

Trying to keep it as simple as possible - if I can even reference the named items on their own then I could at least get them into a new column, and then from that do whatever operations on them:

people <- tibble(personstat = list(fiona, barry, basil)) %>%
  mutate(height_mm = personstat$height_mm)
# Error in mutate_impl(.data, dots) : 
#  Unsupported type NILSXP for column "height_mm"

Note the different error, which is interesting - it's no longer complaining about finding the column, just struggling with the named item.

I can get it to work using base functions, cbind() and sapply() with [[ as the function:

people <- tibble(personstat = list(fiona, barry, basil)) %>%
  cbind(height_mm = sapply(.$personstat, '[[', name="height_mm"))

people
#            personstat height_mm
# 1 Fiona Foo, 1750, 75      1750
# 2 Barry Bar, 1700, 73      1700
# 3 Basil Baz, 1800, 74      1800

Though that loses the tibble-iness.

class(people)
# [1] "data.frame"

And finally, that got me to this, which works, but it feels like using sapply() sort of misses the point of dplyr mutate(), which I think should work all the way down a column without needing that:

people <- tibble(personstat = list(fiona, barry, basil)) %>%
   mutate(height_mm = sapply(.$personstat, '[[', name="height_mm"))
people
# A tibble: 3 x 2
#           personstat height_mm
#               <list>     <dbl>
# 1 <S3: person_stats>      1750
# 2 <S3: person_stats>      1700
# 3 <S3: person_stats>      1800

Is there any way of using mutate() to get the output as above, without having to rely on something like sapply()? Or, indeed, any other sensible ways of extracting named values from within list-like S3 objects stored in a column of a tibble?

2

2 Answers

2
votes

rowwise can handle such case:

people <- tibble(personstat = list(fiona, barry, basil))

people %>%
    rowwise() %>%
    mutate(height_mm = personstat$height_mm)
# # A tibble: 3 × 2
#           personstat height_mm
# <list>     <dbl>
# 1 <S3: person_stats>      1750
# 2 <S3: person_stats>      1700
# 3 <S3: person_stats>      1800

people %>%
    rowwise() %>%
    mutate(height_inch = personstat$height_mm / 25.4)

# # A tibble: 3 × 2
#           personstat height_inch
# <list>       <dbl>
# 1 <S3: person_stats>    68.89764
# 2 <S3: person_stats>    66.92913
# 3 <S3: person_stats>    70.86614
1
votes

If you would like to keep it in the tidyverse you could use purrr::map_dbl here:

library(tidyverse)    
people %>% mutate(height = map_dbl(personstat, "height_mm"))