1
votes

I have been searching in vain to find a solution to efficiently combine specific columns from separate data frames or from separate objects within the same list into new data frames. I have 5 data frames that all have the same row names (e.g. x, y, z) and the same column names (V1, V2, V3,...V1000).

   V1   V2   V3   V4...      V1    V2   V3   V4...
x   1    5    8    9      x   2     7    9    5
y   2    7    4    8      y   4     6    5    6
z   4    4    5    9      z   3     4    8    7

What I need to do is to extract V1 from all five frames and make a new data frame with those five V1 columns, and then do the same for the remaining V2-V1000 to get something like:

   V1   V1   V1   V1   V1         V2    V2   V2   V2   V2
x   1    5    8    9    5      x   2     7    9    5    5
y   2    7    4    8    8      y   4     6    5    6    4
z   4    4    5    9    7      z   3     4    8    7    7

Eventually, I need to apply another function to all 1000 of the newly created data frames so if there is a way to loop this merging process and place the 1000 new frames into a new set of frames, that would be ideal.

I have tried various forms of merge, cbind, sapply and other solutions I have found suggested here and elsewhere on the web. The best I've been able to come up with is getting all five data frames into a single list and then using sapply to merge V1 from each object using sapply(y, "[[",2) where y is the list and 2 corresponds to the V1 column in each object. However, I can't seem to get this looped. I could use this code to manually create the 1000 new data frames but that would take forever.

Any suggestions or directions to other answers that might work would be appreciated!

1

1 Answers

2
votes

We can loop through the column names and extract the columns from the 'data.frame's and cbind it.

nm1 <- paste0("V", 1:1000)
lst <- lapply(nm1, function(x) 
           cbind(df1[x], df2[x], df3[x], df4[x], df5[x]))

The output is a list of data.frames. It is better to keep them as a list instead of creating separate objects in the global environment. But if we need so,

 list2env(setNames(lst, paste0("df_new", seq_along(lst))),
               envir = .GlobalEnv)

data

set.seed(24)
df1 <- as.data.frame(matrix(sample(0:10, 3*1000, 
 replace=TRUE), ncol=1000, dimnames=list(c('x', 'y', 'z'), 
  paste0("V", 1:1000))) )
df2 <- as.data.frame(matrix(sample(0:10, 3*1000, 
 replace=TRUE), ncol=1000, dimnames=list(c('x', 'y', 'z'),
     paste0("V", 1:1000))) )
df3 <- as.data.frame(matrix(sample(0:10, 3*1000,
   replace=TRUE), ncol=1000, dimnames=list(c('x', 'y', 'z'), 
   paste0("V", 1:1000))) )
df4 <- as.data.frame(matrix(sample(0:10, 3*1000, 
   replace=TRUE), ncol=1000, dimnames=list(c('x', 'y', 'z'),
   paste0("V", 1:1000))) )

 df5 <- as.data.frame(matrix(sample(0:10, 3*1000, 
    replace=TRUE), ncol=1000, dimnames=list(c('x', 'y', 'z'), 
   paste0("V", 1:1000))) )