0
votes

I would like to compute the difference between columns in two data frames. The data frames have a different total number of columns and the column names between the data frames have a similar pattern. I would like to compute the difference between similarly named columns.

I would appreciate some hints on how to start thinking about executing this in R or some example code.

Here is a sample of what the data frames look like:

DF1

w_H_11_XA    w_H_13_XA    w_H_16_XA    w_13_03_XA    w_13_12_XA
10           12          1                8           12
11           11          8                6           19

DF2

w_H_11_BA    w_H_16_BA     w_13_12_BA
8            1            10
9            4            9

So here both data sets have columns w_H_11*, w_H_16*, and w_13_12* 'in common', meaning they have similar patterns in the column names. I would like to produce a data set which takes the difference between the similarly matched columns only. Like so:

w_H_11    w_H_16    w_13_12
2          0         2
2          4         10

I have thought about merging the data frames and arranging the columns in order by name; however, I am not sure how to automate computing the difference. The actual data set has a few hundred columns.

Would appreciate any feedback.

2

2 Answers

0
votes

If the difference between the two names are just the last character, then we could use adist

 a = which(adist(names(DF1),names(DF2))==1,T) 
 result = DF1[,a[,1]]-DF2[,a[,2]]
 setNames(result,sub("_[A-Z]$",'',names(result)))
  w_H_11 w_H_16 w_13_12
1      2      0       2
2      2      4      10

with the updated table, it seems we delete all the letters to the end thus you could do:

a = which(do.call(adist,lapply(list(names(DF1),names(DF2)),sub,pat="_[^_]*$",rep=""))==0,T) and the rest remains

0
votes

An option would be to remove the suffix part from the column names, then do the intersect and use that to subset the columns of the data.frame and get the diffference

nm1 <- sub("_[A-Z]$", "", names(DF1))
nm2 <- sub("_[A-Z]$", "", names(DF2))
nm3 <- intersect(nm1, nm2)
nm4 <- paste(nm3, collapse="|")
out <- DF1[grep(nm4, names(DF1))] - DF2[grep(nm4, names(DF2))]
names(out) <- sub("_[A-Z]$", "", names(out))
out
#  w_H_11 w_H_16 w_13_12
#1      2      0       2
#2      2      4      10

data

DF1 <- structure(list(w_H_11_A = 10:11, w_H_13_A = 12:11, w_H_16_A = c(1L, 
8L), w_13_03_A = c(8L, 6L), w_13_12_A = c(12L, 19L)), class = "data.frame",
row.names = c(NA, 
-2L))

DF2 <- structure(list(w_H_11_B = 8:9, w_H_16_B = c(1L, 4L), 
    w_13_12_B = 10:9), class = "data.frame", row.names = c(NA, 
-2L))