How to merge two dataframes and keep only different columns (content)?

Question

I have two data frame with same row size and different column number, the name of the columns is also different, however the content may be similar in some of them.

i.e. df1:

df1<- data.frame("a"=c("0","1","0","1","0","0","0"),
                "b"=c("1","1","1","1","1","0","0"),
                "c"=c("1","1","0","0","1","0","0"),
                "d"=c("1","1","1","1","1","1","1"))

df2:

df2<- data.frame("e"=c("1","1","0","1","0","0","0"),
                "f"=c("1","1","1","1","1","0","0"),
                "g"=c("0","0","0","0","1","0","0"),
                "h"=c("0","0","0","0","1","1","1"))

If you see, the column "b" of df1 and "f" of df2 are equal. Therefore, the result I want is a new dataframe looking like this:

df3 <- data.frame("a"=c("0","1","0","1","0","0","0"),
                  "c"=c("1","1","0","0","1","0","0"),
                  "d"=c("1","1","1","1","1","1","1"),
                  "e"=c("1","1","0","1","0","0","0"),
                  "g"=c("0","0","0","0","1","0","0"),
                  "h"=c("0","0","0","0","1","1","1"))

NOTE: column "b" and "f" (that were similar) are not in the new df3. I have looked in the web but I did not find an example for this. I think the major complexity is that the merge is by content and not by column name.

could you not merge then remove them by using df3[, -c(2, 3)], the numbers in the brackets suggest which columns to remove. Although, you may want an all-in-one function for your suggestion? — Lime
Hi Lime, the problem is that my data frames are bigger than this simplified example (around 2000 rows by 10000 columns df1, and 2000 rows time 100 columns df2). So I cannot identify visually which columns are similar. — marb_021

Ivn Ant Ivn Ant · Accepted Answer · 2020-10-14T12:46:02

This would do the job:

df3 <- cbind(df1,df2)
df3 <- t(t(df3)[!(duplicated(t(df3)) | duplicated(t(df3), fromLast = TRUE)),])
df3

#  a c d e g h
#1 0 1 1 1 0 0
#2 1 1 1 1 0 0
#3 0 0 1 0 0 0
#4 1 0 1 1 0 0
#5 0 1 1 0 1 1
#6 0 0 1 0 0 1
#7 0 0 1 0 0 1

this will give you a matrix, you can save the result as a df if so desired

How to merge two dataframes and keep only different columns (content)?

4 Answers