41
votes

Let's say I have a list of data.frames

dflist <- list(data.frame(a=1:3), data.frame(b=10:12, a=4:6))

If i want to extract the first column from each item in the list, I can do

lapply(dflist, `[[`, 1)
# [[1]]
# [1] 1 2 3
# 
# [[2]]
# [1] 10 11 12

Why can't I use the "$" function in the same way

lapply(dflist, `$`, "a")
# [[1]]
# NULL
# 
# [[2]]
# NULL

But these both work:

lapply(dflist, function(x) x$a)
`$`(dflist[[1]], "a")

I realize that in this case one could use

lapply(dflist, `[[`, "a")

but I was working with an S4 object that didn't seem to allow indexing via [[. For example

library(adegenet)
data(nancycats)
catpop <- genind2genpop(nancycats)
mylist <- list(catpop, catpop)

#works
catpop[[1]]$tab

#doesn't work
lapply(mylist, "$", "tab")
# Error in slot(x, name) : 
#   no slot of name "..." for this object of class "genpop"

#doesn't work
lapply(mylist, "[[", "tab")
# Error in FUN(X[[1L]], ...) : this S4 class is not subsettable
2
This one works lapply(dflist, function(x) "$"(x, "a")). - Tim
Nice question. Fyi, the answer is sort of findable with methods("$",dflist[[1]]) - Frank
Well @Frank, it's not that i was unaware that $.data.frame existed, I'm just surprised the problem was caused by method dispatching. I can't think of many other cases where you have to explicitly call one form of a generic function. - MrFlick
@MrFlick -- I'm with you on that. It must (?) have something to do with the odd lazy-evaluation aspects of lapply(), but since some (or, really, all) of that happens down at the level of C-code, I've never been fully able to grasp what it's doing under the hood. - Josh O'Brien
@JoshO'Brien I think it might have more to do with deparsing of parameters with $ than lapply specifically. See this example: f<-function(x,...) `$`(x, ...); f(dflist[[1]], "a"); `$`(dflist[[1]], "a"). This is because $ isn't a "typical" generic, it's a .Primitive() so i bet the secret lies here - MrFlick

2 Answers

30
votes

For the first example, you can just do:

lapply(dflist, `$.data.frame`, "a")

For the second, use the slot() accessor function

lapply(mylist, "slot", "tab")

I'm not sure why method dispatch doesn't work in the first case, but the Note section of ?lapply does address this very issue of its borked method dispatch for primitive functions like $:

 Note:

 [...]

 For historical reasons, the calls created by ‘lapply’ are
 unevaluated, and code has been written (e.g., ‘bquote’) that
 relies on this.  This means that the recorded call is always of
 the form ‘FUN(X[[i]], ...)’, with ‘i’ replaced by the current
 (integer or double) index.  This is not normally a problem, but it
 can be if ‘FUN’ uses ‘sys.call’ or ‘match.call’ or if it is a
 primitive function that makes use of the call.  This means that it
 is often safer to call primitive functions with a wrapper, so that
 e.g. ‘lapply(ll, function(x) is.numeric(x))’ is required to ensure
 that method dispatch for ‘is.numeric’ occurs correctly.
13
votes

So it seems that this problem has more to do with $ and how it typically expects unquoted names as the second parameter rather than strings. Look at this example

dflist <- list(
    data.frame(a=1:3, z=31:33), 
    data.frame(b=10:12, a=4:6, z=31:33)
)
lapply(dflist, 
    function(x, z) {
        print(paste("z:",z)); 
        `$`(x,z)
    }, 
    z="a"
)

We see the results

[1] "z: a"
[1] "z: a"
[[1]]
[1] 31 32 33

[[2]]
[1] 31 32 33

so the z value is being set to "a", but $ isn't evaluating the second parameter. So it's returning the "z" column rather than the "a" column. This leads to this interesting set of results

a<-"z"; `$`(dflist[[1]], a)
# [1] 1 2 3
a<-"z"; `$`(dflist[[1]], "z")
# [1] 31 32 33

a<-"z"; `$.data.frame`(dflist[[1]], a)
# [1] 31 32 33
a<-"z"; `$.data.frame`(dflist[[1]], "z")
# [1] 31 32 33

When we call $.data.frame directly we are bypassing the standard deparsing that occurs in the primitive prior to dispatching (which happens near here in the source).

The added catch with lapply is that it passes along arguments to the function via the ... mechanism. For example

lapply(dflist, function(x, z) sys.call())
# [[1]]
# FUN(X[[2L]], ...)

# [[2]]
# FUN(X[[2L]], ...)

This means that when $ is invoked, it deparses the ... to the string "...". This explains this behavior

dflist<- list(data.frame(a=1:3, "..."=11:13, check.names=F))
lapply(dflist, `$`, "a")
# [[1]]
# [1] 11 12 13

Same thing happens when you try to use ... yourself

f<-function(x,...) `$`(x, ...); 

f(dflist[[1]], "a");
# [1] 11 12 13
`$`(dflist[[1]], "a")
# [1] 1 2 3