0
votes

Looking to classify an existing event date, based on if it falls within 2 dates in another column

My Raw data looks like this:

study_id Event Event_Date Event_Result Pre1 CheckUp Post1 Pre2 CheckUp2 Post2 Pre3 CheckUp3 Post3 Pre4 CheckUp4 Post4 Pre5 CheckUp5 Post5
1 event1 5/4/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
1 event1 5/15/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
1 event1 6/5/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
1 event1 7/3/2012 0.8 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
2 event2 8/14/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
2 event2 9/11/2012 1.2 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
2 event1 9/21/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
3 event1 10/9/2012 1.1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
3 event1 10/23/2012 1.1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
3 event2 10/25/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
4 event2 11/2/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017
4 event1 11/13/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017

Expected Outcome I am hoping to have a column for each interval rule (see what I have tried section below for interval details, but essentially a total of 10 extra columns), and if the event_Date falls within Interval rule, mark it as TRUE, otherwise false. I was able to do this in Excel, but looking for a solution in R.

See example below.

personID Event_Date Event_Result Pre1 CheckUp Post1 Pre2 CheckUp2 Post2 Pre3 CheckUp3 Post3 Pre4 CheckUp4 Post4 Pre5 CheckUp5 Post5 Interval1 Interval2 Interval3 Interval4 Interval5 Interval6 Interval7 Interval8 Interval9 Interval10
1 5/4/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
1 5/15/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
1 6/5/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
2 7/3/2012 0.8 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
2 8/14/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
3 9/11/2012 1.2 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
3 9/21/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
3 10/9/2012 1.1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
4 10/23/2012 1.1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
4 10/25/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
4 11/2/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
5 11/13/2012 1 7/23/2012 10/23/2012 1/23/2013 11/25/2013 2/25/2014 5/25/2014 8/1/2014 11/1/2014 2/1/2015 7/4/2015 10/4/2015 1/4/2016 7/4/2016 10/4/2016 1/4/2017 FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

What I have Tried I have tried to create the intervals to look at the Event_Result and see if it falls within a Range based on the Post, Pre and CheckUp dates. The intervals rules I care about are the following 10:

Interval1 = If EventDate >= Pre1 AND EvenDate <= CheckUp

Interval2 = If EventDate > CheckUp AND EvenDate <= Post1

Interval3 = If EventDate >= Pre2 AND EvenDate <= CheckUp2

Interval4 = If EventDate > CheckUp2 AND EvenDate <= Post2

*

*

Interval9 = If EventDate >= Pre5 AND EventDate <= CheckUp5

Interval10 = If EventDate > CheckUp5 AND EventDate <= Post5

I tried using LUBRIDATE with no luck. As I mentioned I was able to do this in Excel, but just looking for an approach in R (open to LUBRIDATE, DPLYR, or other libraries)

INTERVAL_1 <- interval(ymd(data_merged$Pre1), ymd(data_merged$EventDate))

2

2 Answers

0
votes

Here is another approach to try.

First, make sure your dates are in Date format. Rename CheckUp to CheckUp1 so consistent with other columns.

Then, you can try pivoting your data into long form with pivot_longer, and use group_by to look at event dates. In this case, a column was created to evaluate if event date was in interval between pre and checkup, and a second column for checkup to post.

After that, you can put your result into wide format again, and join with original data.

library(tidyverse)

prep_data <- raw_data %>%
  mutate(across(starts_with(c("Pre", "CheckUp", "Post")), as.Date, format = "%m/%d/%Y"),
         Event_Date = as.Date(Event_Date, format = "%m/%d/%Y")) %>%
  rename(CheckUp1 = CheckUp)

prep_data %>%
  pivot_longer(cols = starts_with(c("Pre", "CheckUp", "Post")), names_to = c("Event_Type", "Number"), names_pattern = "(\\w+)(\\d+)") %>%
  group_by(study_id, Event, Event_Date, Event_Result, Number) %>%
  mutate(interval_pre = Event_Date >= value[Event_Type == "Pre"] & Event_Date <= value[Event_Type == "CheckUp"],
         interval_post = Event_Date > value[Event_Type == "CheckUp"] & Event_Date <= value[Event_Type == "Post"]) %>%
  pivot_wider(id_cols = c(study_id, Event, Event_Date, Event_Result), names_from = Number, values_from = c("interval_pre", "interval_post"), values_fn = first) %>%
  right_join(prep_data %>% select(study_id, Event, Event_Date, Event_Result))

Output

   study_id Event  Event_Date Event_Result interval_pre_1 interval_pre_2 interval_pre_3 interval_pre_4 interval_pre_5 interval_post_1 interval_post_2 interval_post_3 interval_post_4 interval_post_5
      <int> <chr>  <date>            <dbl> <lgl>          <lgl>          <lgl>          <lgl>          <lgl>          <lgl>           <lgl>           <lgl>           <lgl>           <lgl>          
 1        1 event1 2012-05-04          1   FALSE          FALSE          FALSE          FALSE          FALSE          FALSE           FALSE           FALSE           FALSE           FALSE          
 2        1 event1 2012-05-15          1   FALSE          FALSE          FALSE          FALSE          FALSE          FALSE           FALSE           FALSE           FALSE           FALSE          
 3        1 event1 2012-06-05          1   FALSE          FALSE          FALSE          FALSE          FALSE          FALSE           FALSE           FALSE           FALSE           FALSE          
 4        1 event1 2012-07-03          0.8 FALSE          FALSE          FALSE          FALSE          FALSE          FALSE           FALSE           FALSE           FALSE           FALSE          
 5        2 event2 2012-08-14          1   TRUE           FALSE          FALSE          FALSE          FALSE          FALSE           FALSE           FALSE           FALSE           FALSE          
 6        2 event2 2012-09-11          1.2 TRUE           FALSE          FALSE          FALSE          FALSE          FALSE           FALSE           FALSE           FALSE           FALSE          
 7        2 event1 2012-09-21          1   TRUE           FALSE          FALSE          FALSE          FALSE          FALSE           FALSE           FALSE           FALSE           FALSE          
 8        3 event1 2012-10-09          1.1 TRUE           FALSE          FALSE          FALSE          FALSE          FALSE           FALSE           FALSE           FALSE           FALSE          
 9        3 event1 2012-10-23          1.1 TRUE           FALSE          FALSE          FALSE          FALSE          FALSE           FALSE           FALSE           FALSE           FALSE          
10        3 event2 2012-10-25          1   FALSE          FALSE          FALSE          FALSE          FALSE          TRUE            FALSE           FALSE           FALSE           FALSE          
11        4 event2 2012-11-02          1   FALSE          FALSE          FALSE          FALSE          FALSE          TRUE            FALSE           FALSE           FALSE           FALSE          
12        4 event1 2012-11-13          1   FALSE          FALSE          FALSE          FALSE          FALSE          TRUE            FALSE           FALSE           FALSE           FALSE 
0
votes
library(lubridate)
library(dplyr)
df <- data.frame(
  person_id = c(1, 1, 1, 2, 2, 3, 3, 3),
  event_date = ymd(c("20120504",
                     "20120515",
                     "20120605",
                     "20120703",
                     "20120814",
                     "20120911",
                     "20120921",
                     "20121009"))
)

pre1 <- ymd("20120723")
checkup <- ymd("20121023")
post1 <- ymd("20130123")

interval1 <- interval(pre1, checkup)
interval2 <- interval(checkup, post1)

df %>%
  mutate(
    int1 = event_date %within% interval1,
    int2 = event_date %within% interval2
  )
  person_id event_date  int1  int2
1         1 2012-05-04 FALSE FALSE
2         1 2012-05-15 FALSE FALSE
3         1 2012-06-05 FALSE FALSE
4         2 2012-07-03 FALSE FALSE
5         2 2012-08-14  TRUE FALSE
6         3 2012-09-11  TRUE FALSE
7         3 2012-09-21  TRUE FALSE
8         3 2012-10-09  TRUE FALSE

another option

library(lubridate)
library(dplyr)
df <- data.frame(
  person_id = c(1, 1, 1, 2, 2, 3, 3, 3),
  event_date = ymd(c("20120504",
                     "20120515",
                     "20120605",
                     "20120703",
                     "20120814",
                     "20120911",
                     "20120921",
                     "20121009")),
  pre1 = ymd("20120723"),
  checkup = ymd("20121023"),
  post1 = ymd("20130123")
)

df %>%
  mutate(
    int1 = event_date %within% interval(pre1, checkup),
    int2 = event_date %within% interval(checkup, post1)
  )
  person_id event_date       pre1    checkup      post1  int1  int2
1         1 2012-05-04 2012-07-23 2012-10-23 2013-01-23 FALSE FALSE
2         1 2012-05-15 2012-07-23 2012-10-23 2013-01-23 FALSE FALSE
3         1 2012-06-05 2012-07-23 2012-10-23 2013-01-23 FALSE FALSE
4         2 2012-07-03 2012-07-23 2012-10-23 2013-01-23 FALSE FALSE
5         2 2012-08-14 2012-07-23 2012-10-23 2013-01-23  TRUE FALSE
6         3 2012-09-11 2012-07-23 2012-10-23 2013-01-23  TRUE FALSE
7         3 2012-09-21 2012-07-23 2012-10-23 2013-01-23  TRUE FALSE
8         3 2012-10-09 2012-07-23 2012-10-23 2013-01-23  TRUE FALSE