Matching timestamps in two dataframes using dplyr

Question

So I have a dataframe like this

DF1

ID    timestamp              value
1     2018-07-31 20:00:00    45555
1     2018-08-01 06:50:00    333
1     2018-08-01 07:00:00    322
2     2018-08-01 13:00:00    1222
2     2018-08-01 22:00:00    1111
3     2018-08-01 11:00:00    0
3     2018-08-02 08:00:00    22222

And another one like this

DF2

          startTime             endTime   ID    
2018-08-01 06:45:00 2018-08-01 09:49:00   1
2019-12-02 11:01:00 2019-12-02 11:02:00   2
2018-08-02 07:59:00 2018-08-02 08:50:00   2
2018-08-02 07:59:00 2018-08-02 08:50:00   3

So I want to create a new column using mutate which mutates DF2 that tells me if any time in between the startTime and endTime appears in DF1 that matches the ID.

So in DF2 if we have a date like 8/2/2018 and the time span is 7:59AM - 8:50AM , if any timestamp within this interval inclusive (2018-08-02 7:59:00, 2018-08-02 8:00:00, 2018-08-02 8:01:00...) appears in DF1 then a variable in DF2 reads 1, else 0. This is only the case if the ID matches. Does not matter how many times, atleast once is what matters.

Final version of DF2 should look like.

          startTime             endTime   ID   match
2018-08-01 06:45:00 2018-08-01 09:49:00   1    1
2019-12-02 11:01:00 2019-12-02 11:02:00   2    0
2018-08-02 07:59:00 2018-08-02 08:50:00   2    0
2018-08-02 07:59:00 2018-08-02 08:50:00   3    1

Ideally I would like to use BETWEEN() and remain in the tidyverse.

I think this can be better solved in data.table i.e. more efficiently — akrun
DF1 is a postgresql database btw; and both classes are POSIXct btw. — John Thomas
@akrun mutate is what makes sense to me but if you see a better alternative I'm open to suggestions — John Thomas

akrun akrun · Accepted Answer · 2020-07-01T20:48:43

We could use a data.table non-equi join on the DateTime columns along with 'ID' column, create the match by checking if the number of rows (.N) is greater than 0

library(data.table)
DF2$match <- 0L
setDT(DF2)[DF1,  match := +(.N > 0),on = .(ID, 
        startTime < timestamp, endTime > timestamp), by = .EACHI]
DF2
#             startTime             endTime ID match
#1: 2018-08-01 06:45:00 2018-08-01 09:49:00  1     1
#2: 2019-12-02 11:01:00 2019-12-02 11:02:00  2     0
#3: 2018-08-02 07:59:00 2018-08-02 08:50:00  2     0
#4: 2018-08-02 07:59:00 2018-08-02 08:50:00  3     1

data

DF1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L),
 timestamp = structure(c(1533085200, 
1533124200, 1533124800, 1533146400, 1533178800, 1533139200, 1533214800
), class = c("POSIXct", "POSIXt"), tzone = ""), value = c(45555L, 
333L, 322L, 1222L, 1111L, 0L, 22222L)), row.names = c(NA, -7L
), class = "data.frame")

DF2 <- structure(list(startTime = structure(c(1533123900, 1575306060, 
1533214740, 1533214740), class = c("POSIXct", "POSIXt"), tzone = ""), 
    endTime = structure(c(1533134940, 1575306120, 1533217800, 
    1533217800), class = c("POSIXct", "POSIXt"), tzone = ""), 
    ID = c(1L, 2L, 2L, 3L)), row.names = c(NA, -4L), class = "data.frame")

Matching timestamps in two dataframes using dplyr

2 Answers

data