How to calculate the difference between different data frames with common column names

Question

I have three data frames and trying to calculate the difference between two data frames (Df2 and Df3) conditioned by data frame 1. As explained in following example I have three data frames, Df1, Df2 and Df3 with common names. In first step, in Df1, I want to compare the values of “standard” column with all three columns, “Das”,”Dss” and ”Tri” probably row wise and where ever any value of these columns, “Das”, “Dss” and “Tri” is higher than the “Standard” in Df1, calculate the difference of same position in Df2 and Df3 and put the difference in a separate column.

Df1             
    Names   Standard    Das Dss Tri
    Aa  3   3   6   2
    Ab  4   6   4   3
    Ac  2   5   2   4
    Ad  4   3   3   8
    Ae  6   4   5   7
    Af  4   5   7   5
    Ag  2   6   8   2
    Ah  9   7   6   2

Df2         
    Names   Das Dss Tri
    Aa  4   2   5
    Ab  7   5   4
    Ac  5   7   2
    Ad  6   4   3
    Ae  5   3   5
    Af  3   2   6
    Ag  2   5   4
    Ah  4   6   3

Df3

Names   Das Dss Tri
    Aa  5   3   5
    Ab  8   5   4
    Ac  6   7   2
    Ad  6   4   7
    Ae  5   3   8
    Af  4   5   6
    Ag  1   5   4
    Ah  4   6   3

Final Ouput

Df3             
    Names   Das Dss Tri Difference
    Aa  5   3   5   -1
    Ab  8   5   4   -1
    Ac  6   7   2   -1
    Ad  6   4   7   -4
    Ae  5   3   8   -3
    Af  4   5   6   -4
    Ag  1   5   4   1
    Ah  4   6   3   0

@Arun yes that is true these are the situations in my data. I am trying to figure out for past whole week but I am not able to figure out. You pointed out correctly. I was thinking that "sum" could be used if more than 1 column is greater. — Gongon
Looks like you want the first which is greater than Standard. — Matthew Lundberg

Arun Arun · Accepted Answer · 2013-01-01T18:34:35

Here's the script that takes the index of the first biggest value if more than 1 value is found and if no values are found, NA is returned.

df1 <- structure(list(standard = c(3, 4, 2, 4, 6, 4, 2, 9), das = c(3, 
6, 5, 3, 4, 5, 6, 7), dss = c(6, 4, 2, 3, 5, 7, 8, 6), tri = c(2, 
3, 4, 8, 7, 5, 2, 2)), .Names = c("standard", "das", "dss", "tri"
), row.names = c(NA, -8L), class = "data.frame")

df2 <- structure(list(das = c(4, 7, 5, 6, 5, 3, 2, 4), dss = c(2, 
5, 7, 4, 3, 2, 5, 6), tri = c(5,4,2,3,5,6,4,3)), .Names = c("das", "dss", "tri"
), row.names = c(NA, -8L), class = "data.frame")

df3 <- structure(list(das = c(5, 8, 6, 6, 5, 4, 1, 4), dss = c(3, 
     5, 7, 4, 3, 5, 5, 6), tri = c(5,4,2,7,8,6,4,3)), .Names = c("das", "dss", "tri"
 ), row.names = c(NA, -8L), class = "data.frame")

# get indices. run through every row of df1
# and get the maximum column index > standard
idx.v <- sapply( 1:nrow(df1), function(idx) {
    t <- which(df1[idx, 2:4] > df1[idx, 1])
})

df3$result <- sapply(1:length(idx.v), function(ix) {
    col.idx <- idx.v[[ix]]
    len.idx <- length(col.idx)
    if (len.idx > 0) {
        res <- sum(df2[ix, col.idx] - df3[ix, col.idx])
    } else {
        res <- NA
    }
})

Output:

> df3
  das dss tri result
1   5   3   5     -1
2   8   5   4     -1
3   6   7   2     -1
4   6   4   7     -4
5   5   3   8     -3
6   4   5   6     -4
7   1   5   4      1
8   4   6   3     NA

Thanks for the chat. This is what you require.

How to calculate the difference between different data frames with common column names

2 Answers