8
votes

I have three independent data.frames. The three data.frames have the same number of columns and the same number of rows. Additionally They have the same column names. I' m trying to merge the three data.frames according to column names. I'm using the following code wrote to merge two data.frames and return the number of matches.

Merged_DF = sapply(names(DF1),function(n) nrow(merge(DF1, DF2, by=n)))

The problem is that while in this example there are two data.frames, in my case I have 3 data.frames. How can I modify the code to merge three data.frames instead of two? I tried to modify the string in this way simply adding the third data.frame but it does not work:

  Merged_DF = sapply(names(DF1),function(n) nrow(merge(DF1, DF2, DF3,  by=n)))

It returns the following error:

 Error in fix.by(by.x, x) :  'by' must specify column(s) as numbers, names or logical

Ex:

DF1

 G1  G2  G3
  a   b   f
  b   c   a
  c   d   b

DF2

 G1  G2  G3
  A   b   f
  b   c   a
  h   M   b

DF3

 G1  G2  G3
  a   b   f
  b   l   a
  j   M   v

The data.frames have around 250 rows and 50 cols.

3
Gives us an example of your data.frames please.Arun

3 Answers

2
votes

After researching this very same question for a couple hours today, I came up with this simple but elegant solution using a combination of 'dplyr' pipes and the base R 'merge()' function.

MergedDF <- merge(DF1, DF2) %>%
              merge(DF3)

As you mention in your post, this assumes that the column names are the same and that there's the same number of rows in each data frame you are merging. This will also automatically eliminate any duplicate columns (i.e., identifiers) that were used in the merging process.

11
votes

You can use the Reduce function to merge multiple data frames:

df_list <- list(DF1, DF2, DF3)
Reduce(function(x, y) merge(x, y, all=TRUE), df_list, accumulate=FALSE)

Or merge_recurse from the reshape package:

library(reshape)
data <- merge_recurse(df_list)

See also the R Wiki: Merge data frames

0
votes

Just in case anyone wants to merge multiple data frames with the same column name but unequal row numbers, this article was helpful: https://medium.com/coinmonks/merging-multiple-dataframes-in-r-72629c4632a3

Basically, you use the do.call and rbind functions:

Merged <- do.call("rbind", list(df1, df2, df3, df4))