Using multiple data frames to introduce new variables into each other R

Question

I've got three data frames (Df1, Df2, Df3). These data frames have some variable in common, but they also each contain some unique variables. I'd like to make sure that all variables are represented in all data frames, eg material is present in Df2 but not Df1, so I'd like to create a variable named material in Df1 and set that variable to be NA. Thanks for any help.

Starting point (dfs):

Df1 <- data.frame("color"=c(1,1,1),"price"=c(1,1,1),"buyer"=c(1,1,1))
Df2 <- data.frame("color"=c(1,1,1),"material"=c(1,1,1),"size"=c(1,1,1))
Df3 <- data.frame("color"=c(1,1,1),"price"=c(1,1,1),"key"=c(1,1,1))

Desired outcome (dfs):

Df1 <- data.frame("color"=c(1,1,1),"price"=c(1,1,1),"material"=c(NA,NA,NA),"buyer"=c(1,1,1),"size"=c(NA,NA,NA),"key"=c(NA,NA,NA))
Df2 <- data.frame("color"=c(1,1,1),"price"=c(NA,NA,NA),"material"=c(1,1,1),"buyer"=c(NA,NA,NA),"size"=c(1,1,1),"key"=c(NA,NA,NA))
Df3 <- data.frame("color"=c(1,1,1),"price"=c(1,1,1),"material"=c(NA,NA,NA),"buyer"=c(NA,NA,NA),"size"=c(NA,NA,NA),"key"=c(1,1,1))

My code so far: (I'm trying to compare the variable names in an individual data frame with the variable names in all three data frames, and use the ones not present in the individual data frame to generate the new variables set to NA. But I end up with: Error in VarDf1[, NewVariables] <- NA :incorrect number of subscripts on matrix). Don't know how to fix it.

dfs <- list(Df1,Df2,Df3)  
numdfs <- length(dfs)
for (i in 1:numdfs) 
{
  VarDf1 <- as.vector(names(Df1)) 
  VarDf2 <- as.vector(names(Df2))
  VarDf3 <- as.vector(names(Df3))
  VarAll <- c(VarDf1, VarDf2,VarDf3)
  NewVariables <- as.vector(setdiff(VarAll, dfs[i]))  
  dfs[i][ , NewVariables] <- NA
}

James James · Accepted Answer · 2017-10-03T11:59:16

rbind.fill from the plyr package does what you expect while also combining everything into a big data.frame:

plyr::rbind.fill(Df1,Df2,Df3)
  color price buyer material size key
1     1     1     1       NA   NA  NA
2     1     1     1       NA   NA  NA
3     1     1     1       NA   NA  NA
4     1    NA    NA        1    1  NA
5     1    NA    NA        1    1  NA
6     1    NA    NA        1    1  NA
7     1     1    NA       NA   NA   1
8     1     1    NA       NA   NA   1
9     1     1    NA       NA   NA   1

You can subset the data back out in to new data.frames.

Using multiple data frames to introduce new variables into each other R

4 Answers