I am trying to join two data tables using a rolling join. My time/date stamps are POSIXct specified as eastern standard time zone (EST). During the join, values using GMT are instead retrieved, i.e., the value from EST - 5 hours. Additionally, it's making corrections for daylight savings time, retrieving values from EST - 4 hours during daylight savings time.
I've tried using fuzzyjoin
and that works correctly but suffers from being very slow even for small joins and runs out of resources for larger problems (I'm guessing because it's doing a full cartesian join then filtering) so won't run my real job. I've looked through stackoverlow and I haven't seen this problem described
In the example below, the flow data is stream flow measured at 15-minute intervals (in cubic meters per second) and the chem data are results from analyzing grab samples. What I want is to find the value of stream flow (Q.cms) that is closest in time to the time the grab sample was collected. In the result, Q.cms retrieved from the flow data should match flow.EST and not flow.GMT (flow.EST and flow.GMT were obtained using the vlookup function in Excel). You can look at the resulting table or use the ggplot to visualize the results. The black symbols should fall on the 1:1 line not the red symbols. Some of this follows the examples given here https://r-norberg.blogspot.com/2016/06/understanding-datatable-rolling-joins.html. In the roll statement, I've also tried "-Inf" and "T" - neither help.
library(tidyverse)
library(data.table)
NHchem <- tibble(SampleDateTime = list("2012-03-07 15:15:00",
"2012-03-07 07:57:00",
"2012-03-07 09:00:00",
"2012-03-07 08:31:00",
"2012-03-07 08:00:00"),
Parameter = rep("HgD", 5),
Value = c(11.83, 9.09, 8.98, 10.91, 10.91),
flow.EST = c(1.734292, 1.730299, 1.711665, 1.721448, 1.729299),
flow.GMT = c(1.729082, 1.855115, 1.818062, 1.838623, 1.855263)
)
NHchem$SampleDateTime <- lubridate::ymd_hms(NHchem$SampleDateTime, tz = "EST")
NHflow <- tibble(DateTime = list("2012-03-07 00:00:00", "2012-03-07 00:15:00", "2012-03-07 00:30:00",
"2012-03-07 00:45:00", "2012-03-07 01:00:00", "2012-03-07 01:15:00",
"2012-03-07 01:30:00", "2012-03-07 01:45:00", "2012-03-07 02:00:00",
"2012-03-07 02:15:00", "2012-03-07 02:30:00", "2012-03-07 02:45:00",
"2012-03-07 03:00:00", "2012-03-07 03:15:00", "2012-03-07 03:30:00",
"2012-03-07 03:45:00", "2012-03-07 04:00:00", "2012-03-07 04:15:00",
"2012-03-07 04:30:00", "2012-03-07 04:45:00", "2012-03-07 05:00:00",
"2012-03-07 05:15:00", "2012-03-07 05:30:00", "2012-03-07 05:45:00",
"2012-03-07 06:00:00", "2012-03-07 06:15:00", "2012-03-07 06:30:00",
"2012-03-07 06:45:00", "2012-03-07 07:00:00", "2012-03-07 07:15:00",
"2012-03-07 07:30:00", "2012-03-07 07:45:00", "2012-03-07 08:00:00",
"2012-03-07 08:15:00", "2012-03-07 08:30:00", "2012-03-07 08:45:00",
"2012-03-07 09:00:00", "2012-03-07 09:15:00", "2012-03-07 09:30:00",
"2012-03-07 09:45:00", "2012-03-07 10:00:00", "2012-03-07 10:15:00",
"2012-03-07 10:30:00", "2012-03-07 10:45:00", "2012-03-07 11:00:00",
"2012-03-07 11:15:00", "2012-03-07 11:30:00", "2012-03-07 11:45:00",
"2012-03-07 12:00:00", "2012-03-07 12:15:00", "2012-03-07 12:30:00",
"2012-03-07 12:45:00", "2012-03-07 13:00:00", "2012-03-07 13:15:00",
"2012-03-07 13:30:00", "2012-03-07 13:45:00", "2012-03-07 14:00:00",
"2012-03-07 14:15:00", "2012-03-07 14:30:00", "2012-03-07 14:45:00",
"2012-03-07 15:00:00", "2012-03-07 15:15:00", "2012-03-07 15:30:00",
"2012-03-07 15:45:00", "2012-03-07 16:00:00", "2012-03-07 16:15:00",
"2012-03-07 16:30:00", "2012-03-07 16:45:00", "2012-03-07 17:00:00",
"2012-03-07 17:15:00", "2012-03-07 17:30:00", "2012-03-07 17:45:00",
"2012-03-07 18:00:00", "2012-03-07 18:15:00", "2012-03-07 18:30:00",
"2012-03-07 18:45:00", "2012-03-07 19:00:00", "2012-03-07 19:15:00",
"2012-03-07 19:30:00", "2012-03-07 19:45:00", "2012-03-07 20:00:00",
"2012-03-07 20:15:00", "2012-03-07 20:30:00", "2012-03-07 20:45:00",
"2012-03-07 21:00:00", "2012-03-07 21:15:00", "2012-03-07 21:30:00",
"2012-03-07 21:45:00", "2012-03-07 22:00:00", "2012-03-07 22:15:00",
"2012-03-07 22:30:00", "2012-03-07 22:45:00", "2012-03-07 23:00:00",
"2012-03-07 23:15:00", "2012-03-07 23:30:00", "2012-03-07 23:45:00"
),
Q.cms = c(1.9183428976162, 1.90884554590853, 1.90659614969849, 1.89946370922413,
1.89667381849492, 1.88823268972675, 1.89009845845496, 1.88925133710857,
1.89175644143891, 1.88902902166044, 1.88199493586431, 1.87318875971247,
1.86917038183388, 1.86071830324102, 1.85932613686066, 1.86093401873381,
1.87101369024615, 1.87296981545807, 1.8762632040545, 1.86625922525504,
1.86445655629997, 1.86091440792692, 1.86261797589624, 1.85849714690774,
1.85809949483732, 1.85365423835426, 1.86769219811402, 1.87137568188775,
1.86957715377645, 1.86178854932717, 1.85842153505713, 1.85511511281169,
1.85526345829864, 1.84606992374992, 1.8386232796469, 1.8306698161104,
1.81806190831676, 1.80972662643993, 1.80671560966769, 1.79692071214778,
1.79246415779217, 1.78383758090983, 1.77911151556738, 1.77540849330392,
1.76977546791765, 1.76704647349273, 1.75506646810193, 1.75116656853259,
1.74116032441497, 1.73694852237565, 1.73539800459245, 1.73029929958069,
1.72929873358336, 1.72145096214292, 1.72144822481096, 1.7137286373663,
1.71166508485562, 1.72512015058183, 1.72801058727519, 1.71946126373971,
1.72908219540057, 1.73003063754833, 1.73433620637326, 1.74704929729279,
1.75374511203315, 1.75751451176466, 1.74550984385788, 1.7374425791972,
1.73028285052801, 1.73164553811319, 1.71538821542136, 1.71111583998635,
1.69850042893639, 1.7018112620745, 1.70068743638687, 1.70624871061042,
1.70513766128531, 1.70189855875322, 1.71388172504149, 1.70687667156756,
1.72965507980508, 1.73429231190258, 1.73062829040751, 1.74161628160761,
1.74502884750137, 1.74764319440737, 1.76775550519358, 1.74926846992512,
1.75406991794209, 1.75302950118107, 1.75219840078475, 1.75821138471585,
1.74749746409036, 1.75047602988317, 1.76192575356191, 1.76360735465132)
)
NHflow$DateTime <- lubridate::ymd_hms(NHflow$DateTime, tz = "EST")
NHflowDT <- setDT(NHflow)
NHchemDT <- setDT(NHchem)
NHflowDT[, join_time := DateTime]
NHchemDT[, join_time := SampleDateTime]
setkey(NHflowDT, join_time)
setkey(NHchemDT, join_time)
NHflow_chem <- NHflowDT[ NHchemDT, roll = 'nearest']
NHflow_chem
ggplot(data = NHflow_chem, mapping = aes(x = flow.EST, y = Q.cms)) +
geom_point(size = 4) +
geom_point(aes(x = flow.GMT, y = Q.cms), color = "red", size = 4) +
geom_abline(slope = 1, intercept = 0) +
labs(x = "flow.EST (black), flow.GMT (red)")