15
votes

I'm sure this is kind of basic, but I'd just like to really understand the logic of R data structures here.

If I subset a matrix by index out of bounds, I get exactly that error:

m <- matrix(data = c("foo", "bar"), nrow = 1)
m[2,]
# Error in m[2, ] : subscript out of bounds

If I do the same do a data frame, however, I get all NA rows:

df <- data.frame(foo = "foo", bar = "bar")
df[2,]
#    foo  bar
# NA <NA> <NA>

If I subset into a non-existent data frame column I get the familiar

df[, 3]
# Error in `[.data.frame`(df, , 3) : undefined columns selected

I know (roughly) that data frame rows are weird and to be treated carefully, but I don't quite see the connection to the above behavior.

Can someone explain why R behaves in this way for non-existent df rows?

Update

To be sure, giving NA on out-of-bounds subsets, is normal R behavior for 1D vectors:

vec <- c("foo", "bar")
vec[3]
# [1] NA

So in a way, the weird one out here is matrix subsetting, not dataframe subsetting, depending from where you're starting out. Still the different 2D subsetting behavior (m[2, ] vs df[2, ]) might strike a dense user (as I am right now) as inconsistent.

1
Regarding "I know (roughly) that data frame rows are weird and to be treated carefully", this is because data frames are heterogeneous (i.e. you can store different formats in different columns), thus they can not be considered as vectors by default and need to be treated ...well...carefully. As far as the indexing goes, I m not sure. Interesting to find out why thoughSotos
Same with lists. Create a list l1 <- list('foo', 'bar'). Now try the following: 1) l1[3], 2) l1[[3]]Sotos
Maybe atomic vector or a similar object of atomic nature give rise to this peculiarity. is.atomic(a_matrix) yields TRUE while for a list or a data.frame this condition is not satisfied.dd_rookie

1 Answers

3
votes

Can someone explain why R behaves in this way[?]

Short answer: No, probably not.


Longer answer: Once upon a time I was thinking about something similar and read this thread on R-devel: Definition of [[. Basically it boils down to:

The semantics of [ and [[ don't seem to be fully specified in the Reference manual. [...] I assume that these are features, not bugs, but I can't find documentation for them

Duncan Murdoch, a former member of the R core team gives a very nice reply:

There is more documentation in the man page for Extract, but I think it is incomplete. The most complete documentation is of course the source code*, but it may not answer the question of what's intentional and what's accidental

As mentioned in the R-devel thread, the only description in the manual is 3.4.1 Indexing by vectors:

If i is positive and exceeds length(x) then the corresponding selection is NA

But, this applies to "indexing of simple vectors". Similar out of bounds indexing for "non-simple" vectors does not seem to be described. Duncan Murdoch again:

So what is a simple vector? That is not explicitly defined, and it probably should be.

Thus, it may seem like no one knows the answer to your why question.


See also "8.2.13 nonexistent value in subscript" in the excellent R Inferno by Patrick Burns, and the section "Missing/out of bounds indices" in Hadley's book.


*Source code for the [ subset operator. A search for R_MSG_subs_o_b (which corresponds to error message "subscript out of bounds") provides no obvious clue why OOB [ indexing of matrices and when using [[ give an error, whereas OOB [ indexing of "simple vectors" results in NA.