R interleave two data frames with same column names

Question

I have two data tables:

before represents a data table in its "raw" state (prior to any cleaning operations).
after represents a data table after various cleaning and manipulation.

They have mostly matching column names.

Is it possible to construct a third data frame where columns with matching names are placed adjacent to one another and the names possibly modified (name.before, name.after) and all excess columns are placed at the end?

For example:

before data frame:

data.table::data.table(a = c(1,2,3), b = c(1,2,3), c = c(1,2,3))

   a b c
1: 1 1 1
2: 2 2 2
3: 3 3 3

after data frame:

data.table::data.table(a = c("a","b","c"), c = c("a","b","c"), d = c(1,2,3))

   a c d
1: a a 1
2: b b 2
3: c c 3

The desired output would be:

   a.before a.after c.before c.after d
1:        1       a        1       a 1
2:        2       b        2       b 2
3:        3       c        3       c 3

The purpose of this would be for easy comparison of identical columns to verify that column outputs are appropriate after various functions have been applied to the data.table.

akrun akrun · Accepted Answer · 2020-09-20T21:03:41

An option is to cbind and use setcolorder on the ordered column names concatenate and then use make.unique if the intention is to identify the before/after on the duplicate column names

library(data.table)
out <- setcolorder(cbind(dt1, dt2), order(c(names(dt1), names(dt2))))[]
setnames(out, make.unique(names(out)))[]
out[, setdiff(names(dt1), names(dt2)) := NULL][]
#   a.before a.after c.before c.after d
#1:        1       a        1       a 1
#2:        2       b        2       b 2
#3:        3       c        3       c 3

If we need to specifically use before/after

out <- setcolorder(cbind(dt1, dt2), order(c(names(dt1), names(dt2))))[]    
out[, setdiff(names(dt1), names(dt2)) := NULL][]
i1 <- duplicated(names(out), fromLast = TRUE)
i2 <- duplicated(names(out))
names(out)[i1] <- paste0(names(out)[i1], ".before")
names(out)[i2] <- paste0(names(out)[i2], ".after")   

out
#   a.before a.after c.before c.after d
#1:        1       a        1       a 1
#2:        2       b        2       b 2
#3:        3       c        3       c 3

R interleave two data frames with same column names

3 Answers