3
votes

I want to combine two distinct dataframes(completely different columns) in R, into one inclusive data frame.

Lets say data frame "df_a" has columns A & B:

df_a <- read.table(header=TRUE, text='
    A   B
    1   x1
    2   y1
    3   z1
')

And dataframe "df_b" has columns C & D.

df_b <- read.table(header=TRUE, text='
          C     D
          1     6.7
          1     4.5
          1     3.7
          2     3.3
          2     4.1
          2     5.2
')

Therefore the resultant dataframe "df_c" will have columns A,B,C,D, see below:

df_c
  A     B       C   D
1 1     x1      1   6.7
2 2     y1      1   4.5
3 3     z1      1   3.7
4 NA    NA      2   3.3
5 NA    NA      2   4.1
6 NA    NA      2   5.2

Approach #1:

I first tried using rbind() but that function requires matching column names, however that is not what I'm looking for.

Approach #2:

I used df_c <- merge(df_a,df_b), however merge seems to be doing a Cartesian product, see below:

df_c <-  merge(df_a,df_b)
df_c
   A  B C   D
1  1 x1 1 6.7
2  2 y1 1 6.7
3  3 z1 1 6.7
4  1 x1 1 4.5
5  2 y1 1 4.5
6  3 z1 1 4.5
7  1 x1 1 3.7
8  2 y1 1 3.7
9  3 z1 1 3.7
10 1 x1 2 3.3
11 2 y1 2 3.3
12 3 z1 2 3.3
13 1 x1 2 4.1
14 2 y1 2 4.1
15 3 z1 2 4.1
16 1 x1 2 5.2
17 2 y1 2 5.2
18 3 z1 2 5.2

Approach #3:

Than I used df_c <- union(df_a,df_b), but the result is no longer a data frame. Its turned into a list of lists, see below:

[[1]]
[1] 1 2 3

[[2]]
[1] x1 y1 z1
Levels: x1 y1 z1

[[3]]
[1] 1 1 1 2 2 2

[[4]]
[1] 6.7 4.5 3.7 3.3 4.1 5.2

Approach #4

I created my own function called unionNoMatch(), that attempts to append columns from df_2 to df_1 input paramters:

unionNoMatch <- function(df_1, df_2)
{
df_3 <- df_1; 
for (name in names(df_2))
{
cbind(df_2$name,df_3) 
}

return (df_3);

}

df_c <- unionNoMatch (df_a,df_b)

However, I got this error:

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 3

How can I achieve my task of combining 2 data frames with non matching columns into a single data frame?

Thanks

3
So you literally have no way of matching the rows in df_a to those of df_b even though the number of rows differs?MichaelChirico
This is yet another cbind.fill question, one of which was asked earlier today and closed as a duplicate of one of the many other versions of this question. Just google cbind.filljoran

3 Answers

6
votes

R treats variables on the same row as related, so it doesn't want to put things on the same row unless it is told you want them there. In general, this is to prevent mistakes. If you know what you are doing, you can manually give an id to merge by:

df_a$row <- 1:nrow(df_a)
df_b$row <- 1:nrow(df_b)
merge(df_a, df_b, by = "row", all = TRUE)
5
votes

Seems like you're trying to do something that's probably not recommended, but here's what I'd do in data.table:

library(data.table) #1.9.5+ to get the on argument to [.data.table
setDT(df_a,keep.rownames=T); setDT(df_b,keep.rownames=T)
> df_a[df_b,on="rn"]
   rn  A  B C   D
1:  1  1 x1 1 6.7
2:  2  2 y1 1 4.5
3:  3  3 z1 1 3.7
4:  4 NA NA 2 3.3
5:  5 NA NA 2 4.1
6:  6 NA NA 2 5.2

(basically, we find something to merge on, namely the row number, then merge on that)

2
votes

We can use add_rownames from dplyr to create a row name column in each of the datasets and then do full_join/left_join

library(dplyr)
full_join(add_rownames(df_a) , add_rownames(df_b), by='rowname') %>% 
                    select(-rowname)
#   A    B C   D
#1  1   x1 1 6.7
#2  2   y1 1 4.5
#3  3   z1 1 3.7
#4 NA <NA> 2 3.3
#5 NA <NA> 2 4.1
#6 NA <NA> 2 5.2