I have a large (20,000 obs) data.frame containing hourly values and grouped by unique id. I also have a list of dates (each of the dates occurs in the data.frame). I am trying to match the dates to the data.frame, and then extract datetimes that are between + or – a certain time interval from the matching date. For example, in the following data.frame:
setAs("character","myDate", function(from) as.POSIXct(from, "%m/%e/%Y %H:%M", tz="UTC"))
# previous function formats date input as UTC
df <- read.table(textConnection("datetimeUTC id value
'5/1/2013 5:00' 153 0.53
'5/1/2013 6:00' 153 0.46
'5/1/2013 7:00' 153 0.53
'5/1/2013 8:00' 153 0.46
'5/1/2013 9:00' 153 0.44
'5/1/2013 10:00' 153 0.48
'5/1/2013 11:00' 153 0.49
'5/1/2013 12:00' 153 0.49
'5/1/2013 13:00' 153 0.51
'5/1/2013 14:00' 153 0.53
'11/24/2013 9:00' 154 0.45
'11/24/2013 10:00' 154 0.46
'11/24/2013 11:00' 154 0.49
'11/24/2013 12:00' 154 0.55
'11/24/2013 13:00' 154 0.61
'11/24/2013 14:00' 154 0.7
'11/24/2013 15:00' 154 0.74
'11/24/2013 16:00' 154 0.78
'11/24/2013 17:00' 154 0.77
'11/24/2013 18:00' 154 0.79
'8/2/2015 1:00' 240 0.2
'8/2/2015 2:00' 240 0.2
'8/2/2015 3:00' 240 0.2
'8/2/2015 4:00' 240 0.22
'8/2/2015 5:00' 240 0.22
'8/2/2015 6:00' 240 0.27
'8/2/2015 7:00' 240 0.23
'8/2/2015 8:00' 240 0.21
'8/2/2015 9:00' 240 0.22
'8/2/2015 10:00' 240 0.22
'8/2/2015 11:00' 240 0.21
'8/2/2015 12:00' 240 0.21
'8/2/2015 13:00' 240 0.21
'8/2/2015 14:00' 240 0.22
'8/2/2015 15:00' 240 0.24
'8/2/2015 16:00' 240 0.25
'8/2/2015 17:00' 240 0.12
'8/2/2015 18:00' 240 0.32
"), header=TRUE, colClasses=c("myDate", "character", "numeric"))
I want to extract, for each id, all observations that are 2 hours before or after the matching datetime from this key:
key <-read.table(textConnection("
datetimeUTC id
'5/1/2013 9:00' 153
'11/24/2013 14:00' 154
'8/2/2015 5:00' 240
'8/2/2015 15:00' 240"), header=TRUE, colClasses=c("myDate", "character"))
The desired result would look as follows:
result <- read.table(textConnection("datetimeUTC id value
'5/1/2013 7:00' 153 0.53
'5/1/2013 8:00' 153 0.46
'5/1/2013 9:00' 153 0.44
'5/1/2013 10:00' 153 0.48
'5/1/2013 11:00' 153 0.49
'11/24/2013 12:00' 154 0.55
'11/24/2013 13:00' 154 0.61
'11/24/2013 14:00' 154 0.7
'11/24/2013 15:00' 154 0.74
'11/24/2013 16:00' 154 0.78
'8/2/2015 3:00' 240 0.2
'8/2/2015 4:00' 240 0.22
'8/2/2015 5:00' 240 0.22
'8/2/2015 6:00' 240 0.27
'8/2/2015 7:00' 240 0.23
'8/2/2015 13:00' 240 0.21
'8/2/2015 14:00' 240 0.22
'8/2/2015 15:00' 240 0.24
'8/2/2015 16:00' 240 0.25
'8/2/2015 17:00' 240 0.12
"), header=TRUE, colClasses=c("myDate", "character", "numeric"))
Seems like a simple task but I can't seem to get what I want. A couple of things that I have tried.
result <-df[which(df$id == key$id &(df$datetimeUTC >= key$datetimeUTC -2*60*60 |df$datetimeUTC <= key$datetimeUTC + 2*60*60 )),]
library(data.table)
dt <- setDT(df)
dt[dt$datetimeUTC %between% c(dt$datetimeUTC - 2*60*60,dt$datetimeUTC + 2*60*60) ]