7
votes

I am trying to figure out why the rbind function is not working as intended when joining data.frames without names. Here is my testing:

test <- data.frame(
            id=rep(c("a","b"),each=3),
            time=rep(1:3,2),
            black=1:6,
            white=1:6,
            stringsAsFactors=FALSE
            )

# take some subsets with different names
pt1 <- test[,c(1,2,3)]
pt2 <- test[,c(1,2,4)]

# method 1 - rename to same names - works
names(pt2) <- names(pt1)
rbind(pt1,pt2)

# method 2 - works - even with duplicate names
names(pt1) <- letters[c(1,1,1)]
names(pt2) <- letters[c(1,1,1)]
rbind(pt1,pt2)

# method 3 - works  - with a vector of NA's as names
names(pt1) <- rep(NA,ncol(pt1))
names(pt2) <- rep(NA,ncol(pt2))
rbind(pt1,pt2)

# method 4 - but... does not work without names at all?
pt1 <- unname(pt1)
pt2 <- unname(pt2)
rbind(pt1,pt2)

This seems a bit odd to me. Am I missing a good reason why this shouldn't work out of the box?

edit for additional info

Using @JoshO'Brien's suggestion to debug, I can identify the error as occurring during this if statement part of the rbind.data.frame function

if (is.null(pi) || is.na(jj <- pi[[j]]))

(online version of code here: http://svn.r-project.org/R/trunk/src/library/base/R/dataframe.R starting at: "### Here are the methods for rbind and cbind.")

From stepping through the program, the value of pi does not appear to have been set at this point, hence the program tries to index the built-in constant pi like pi[[3]] and errors out.

From what I can figure, the internal pi object doesn't appear to be set due to this earlier line where clabs has been initialized as NULL:

if (is.null(clabs)) clabs <- names(xi) else { #pi gets set here

I am in a tangle trying to figure this out, but will update as it comes together.

1
Have a peek at the code of rbind.data.frame most of which is concerned with checking and matching column and row names. You could do debug(rbind.data.frame) and then step through your method 4 to determine exactly where the error gets thrown.Josh O'Brien
@JoshO'Brien - have updated to provide some more info. I'm not so great at interpreting code and am working at it, but maybe it will be obvious to someone else.thelatemail

1 Answers

6
votes

Because unname() & explicitly assigning NA as column headers are not identical actions. When the column names are all NA, then an rbind() is possible. Since rbind() takes the names/colnames of the data frame, the results do not match & hence rbind() fails.

Here is some code to help see what I mean:

> c1 <- c(1,2,3)
> c2 <- c('A','B','C')
> df1 <- data.frame(c1,c2)
> df1
  c1 c2
1  1  A
2  2  B
3  3  C
> df2 <- data.frame(c1,c2) # df1 & df2 are identical
>
> #Let's perform unname on one data frame &
> #replacement with NA on the other
>
> unname(df1)
  NA NA
1  1  A
2  2  B
3  3  C
> tem1 <- names(unname(df1))
> tem1
NULL
>
> #Please note above that the column headers though showing as NA are null
>
> names(df2) <- rep(NA,ncol(df2))
> df2
  NA NA
1  1  A
2  2  B
3  3  C
> tem2 <- names(df2)
> tem2
[1] NA NA
> 
> #Though unname(df1) & df2 look identical, they aren't
> #Also note difference in tem1 & tem2
>
> identical(unname(df1),df2)
[1] FALSE
> 

I hope this helps. The names show up as NA each, but the two operations are different.

Hence, two data frames with their column headers replaced to NA can be "rbound" but two data frames without any column headers (achieved using unname()) cannot.