Combine data.frames in R using only common row.names

Question

I have five data.frames with gene expression data for different sets of samples. I have a different number of rows in each data.set and therefore only partly overlapping row.names (genes).

Now I want a) to filter the five data.frames to contain only genes that are present in all data.frames and b) to combine the gene expression data for those genes to one data.frame.

All I could find so far was merge, but that can only merge two data.frames, so I'd have to use it multiple times. Is there an easier way?

Are there differences in the data between the data frames? In other words, do you want to obtain five rows or one row per unique row name? — Sven Hohenstein
I want one row per unique row name, but multiple columns (one for each sample). — Lilith-Elina

Sven Hohenstein Sven Hohenstein · Accepted Answer · 2013-05-29T08:33:07

Merging is not very efficient if you want to exclude row names which are not present in every data frame. Here's a different proposal.

First, three example data frames:

df1 <- data.frame(a = 1:5, b = 1:5, 
                  row.names = letters[1:5]) # letters a to e
df2 <- data.frame(a = 1:5, b = 1:5, 
                  row.names = letters[3:7]) # letters c to g
df3 <- data.frame(a = 1:5, b = 1:5, 
                  row.names = letters[c(1,2,3,5,7)]) # letters a, b, c, e, and g
# row names being present in all data frames: c and e

Put the data frames into a list:

dfList <- list(df1, df2, df3)

Find common row names:

idx <- Reduce(intersect, lapply(dfList, rownames))

Extract data:

df1[idx, ]

  a b
c 3 3
e 5 5

PS. If you want to keep the corresponding rows from all data frames, you could replace the last step, df1[idx, ], with the following command:

do.call(rbind, lapply(dfList, "[", idx, ))

Combine data.frames in R using only common row.names

2 Answers