Sorry I'm very new to R and I'm not a data expert. I'm trying to calculate a duration omitting overlapping dates. I suspect lubridate is the answer. My data set looks like this:
patientnumber rxnumber startdate stopdate
100 1 1/1/2014 1/5/2014
100 2 1/1/2014 1/5/2014
100 3 1/20/2014 1/22/2014
200 4 2/14/2014 2/14/2014
200 5 2/15/2014 2/20/2014
I'd like to calculate obtain a value for patient 100 of 8 (5 + 3) and 7 for patient 200 (1 +6) to calculate a total exposure for each patient.
The way I think I need to approach this is. Calculate the minimum start date, and maximum stop date for each patient then use a counter variable to count starting from the minimum start date. If the counter variable overlaps with one of the intervals then add one and move along. If it doesn't, just move along until the max stop date is reached.
I just don't know how to code this. This would be the most complex coding I've done in R and the first time I'd use a loop.Please help!
Update @ Richard Appreciate the help. While scaling this up I noticed some problems.
assuming 1 same patient number and increasing rx#
startdate stopdate duration overlap
3/26/2014 3/26/2014 1 3 (this overlap is coming from the record above)
3/27/2014 3/27/2014 1 0
3/27/2014 3/27/2014 1 1
3/27/2014 3/30/2014 4 1
3/28/2014 3/28/2014 1 3 (this unfortunately I'm not sure hwo to fix)
The code is working, just needs to be fine tuned. Hope you can help. I'll continue to try to figure this out.