2
votes

Desired result: I would like to combine dataframes such that columns in DF2 that are in DF1 are row-binded to DF1; columns that are in DF1 but NOT in DF2 have nrow(DF2) zeros added to them; and columns that are in DF2 but NOT in DF1 have nrow(DF1) zeros prepended to them and then those columns are cbinded onto the DF.

What I've tried: I've tried using ncol, nrow, and matrix to generate the parts of the final df, but I'm running into issues, such as corner cases where the second DF only has one column and ncol(DF) returns NULL, for ex.

Template for desired result:

For ex, if I have these two DFs:

> DF_dummy_1
  a b c e
1 1 a f u
2 2 b g v
3 3 c h w
4 4 d i x
5 5 e j y

> DF_dummy_2
  a b d
1 1 p k
2 2 q l
3 3 r m
4 4 s n
5 5 t o

I would like to end up with

> DF_final
    a b c e d
1   1 a f u 0
2   2 b g v 0
3   3 c h w 0
4   4 d i x 0
5   5 e j y 0
6   1 p 0 0 k
7   2 q 0 0 l
8   3 r 0 0 m
9   4 s 0 0 n
10  5 t 0 0 o

Code to generate this ex:

DF_dummy_1 <- data.frame("a" = seq(1:5), "b" = letters[1:5], "c" = letters[6:10], "e" = letters[21:25])
DF_dummy_2 <- data.frame("a" = seq(6:10), "b" = letters[16:20], "d" = letters[11:15])
DF_final <- data.frame("a" = seq(1:10), "b" = c(letters[1:5],letters[16:20]),
                       "c" = c(letters[6:10], rep("0",5)), 
                       "e" = c(letters[21:25], rep("0",5)),
                       "d" = c(rep("0",5), letters[11:15]))
1

1 Answers

2
votes

You could try rbindlist with fill=TRUE option after placing the datasets in a list. (Just for convenience, named the datasets as 'df1', 'df2'), By default, 'NAs' will occupy the missing rows, which can be later replaced to 0.

library(data.table)#data.table_1.9.5
rbindlist(list(df1, df2), fill=TRUE)[,
         lapply(.SD, function(x) replace(x, is.na(x), 0))]
#    a b c e d
# 1: 1 a f u 0
# 2: 2 b g v 0
# 3: 3 c h w 0
# 4: 4 d i x 0
# 5: 5 e j y 0
# 6: 1 p 0 0 k
# 7: 2 q 0 0 l
# 8: 3 r 0 0 m
# 9: 4 s 0 0 n
#10: 5 t 0 0 o

Based on the 'data' updated in the post, some of the columns are 'factors'. You could use stringsAsFactors=FALSE in the data.frame(..) to avoid the character columns getting converted to factors. Considering these are factor columns

  rbindlist(list(DF_dummy_1, DF_dummy_2), fill=TRUE)[, 
    lapply(.SD, function(x) replace(as.character(x), is.na(x), 0))]
 #   a b c e d
 #1: 1 a f u 0
 #2: 2 b g v 0
 #3: 3 c h w 0
 #4: 4 d i x 0
 #5: 5 e j y 0
 #6: 1 p 0 0 k
 #7: 2 q 0 0 l
 #8: 3 r 0 0 m
 #9: 4 s 0 0 n
#10: 5 t 0 0 o