How to calculate difference between columns in different data frames with similar pattern in column names?

Question

I would like to compute the difference between columns in two data frames. The data frames have a different total number of columns and the column names between the data frames have a similar pattern. I would like to compute the difference between similarly named columns.

I would appreciate some hints on how to start thinking about executing this in R or some example code.

Here is a sample of what the data frames look like:

DF1

w_H_11_XA    w_H_13_XA    w_H_16_XA    w_13_03_XA    w_13_12_XA
10           12          1                8           12
11           11          8                6           19

DF2

w_H_11_BA    w_H_16_BA     w_13_12_BA
8            1            10
9            4            9

So here both data sets have columns w_H_11*, w_H_16*, and w_13_12* 'in common', meaning they have similar patterns in the column names. I would like to produce a data set which takes the difference between the similarly matched columns only. Like so:

w_H_11    w_H_16    w_13_12
2          0         2
2          4         10

I have thought about merging the data frames and arranging the columns in order by name; however, I am not sure how to automate computing the difference. The actual data set has a few hundred columns.

Would appreciate any feedback.

Onyambu Onyambu · Accepted Answer · 2019-08-20T17:20:03

If the difference between the two names are just the last character, then we could use adist

 a = which(adist(names(DF1),names(DF2))==1,T) 
 result = DF1[,a[,1]]-DF2[,a[,2]]
 setNames(result,sub("_[A-Z]$",'',names(result)))
  w_H_11 w_H_16 w_13_12
1      2      0       2
2      2      4      10

with the updated table, it seems we delete all the letters to the end thus you could do:

a = which(do.call(adist,lapply(list(names(DF1),names(DF2)),sub,pat="_[^_]*$",rep=""))==0,T) and the rest remains

How to calculate difference between columns in different data frames with similar pattern in column names?

2 Answers

data