1
votes

I have a dataframe like this:

ID  s1  e1  s2  e2
A   50  150 80  180
A   160 350 280 470
A   355 700 800 1150
B   100 500 150 550
B   550 1500    800 1750

When the ID is identical I would like to calculate the difference between values in consecutive rows but different columns (for ID A: s1 in row2 minus e1 in row1; s1 in row3 minus e1 in row2; s2 in row2 minus e2 in row1; s2 in row3 minus e2 in row2) and add these values to a new column (diff1 and diff2).

The dataframe would then look like this:

ID  s1  e1  s2  e2  diff1   diff2
A   50  150 80  180     
A   160 350 280 470 10  100
A   355 700 800 1150    5   330
B   100 500 150 550     
B   550 1500    800 1750    50  250

Is this possible?

Thank you in advance

WD

1

1 Answers

3
votes

After grouping by 'ID', get the lead of 's1', subtract it from 'e1', and create 'diff1' as the lag of this output. Similarly, the 'diff2' can be created the corresponding pairs of 's2' and 'e2' columns

library(dplyr)
df1 %>%
    group_by(ID) %>%
    mutate(diff1 = lag(lead(s1) - e1), diff2 = lag(lead(s2)- e2))
# A tibble: 5 x 7
# Groups: ID [2]
#   ID       s1    e1    s2    e2 diff1 diff2
#   <chr> <int> <int> <int> <int> <int> <int>
#1 A        50   150    80   180    NA    NA
#2 A       160   350   280   470    10   100
#3 A       355   700   800  1150     5   330
#4 B       100   500   150   550    NA    NA
#5 B       550  1500   800  1750    50   250

If there are multiple 's', 'e' pairs, one option with data.table would be to melt it to 'long' format and then dcast to 'wide' after doing the necessary calculation

library(data.table)
dnew <- dcast(melt(setDT(df1, keep.rownames = TRUE),
 measure = patterns("^s\\d+", "^e\\d+"), value.name = c("s", "e"))[, 
  diffs := shift(shift(s, type = "lead") - e), .(ID, variable)][],
           rn + ID ~ paste0('diff', variable), value.var = 'diffs')
df1[, names(dnew)[3:4] := dnew[, 3:4, with = FALSE]][, rn := NULL][]
#   ID  s1   e1  s2   e2 diff1 diff2
#1:  A  50  150  80  180    NA    NA
#2:  A 160  350 280  470    10   100
#3:  A 355  700 800 1150     5   330
#4:  B 100  500 150  550    NA    NA
#5:  B 550 1500 800 1750    50   250