I have an event log in the format below.
Original format
I have created groups by DATE and ID using dplyr,hence a change in either date or ID will be taken as a different group.
I want to have only events that are >= 5secs time interval and remove the rest. Desired output
I have used dplyr and time lag to acheive this,since I am not able to dynamically assign a lag interval for this. But my current code checks for one lag interval and I end up deleting more rows than desired.Current output - all rows in yellow are removed. Ideally I would want "13:10:22", "13:10:24" in group 2 to be retained since the time lag from "13:10:17" to these times is 5 secs and more.
I am using "chron" to handle the times. I understand the time lag logic will not work in mycase. Could there be a better alternative apart from using an expensive for/if loop.
Code I have used
data$Date <- as.Date(data$Date,format = "%m/%d/%Y")
data$Time <- chron(times = data$Time)
data <- data %>% arrange(Date,Time,ID)
data$Group <- data %>% group_by(Date,ID) %>% group_indices
data <- data %>%
group_by(Group) %>%
mutate(time.difference = Time - lag(Time)) %>%
filter(time.difference >= 0.00005787 | is.na(time.difference))
Dput of the data
structure(list(Date = structure(c(17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17470, 17470, 17470, 17470), class = "Date"), Time = structure(c(0.936400462962963, 0.9425, 0.9425, 0.942511574074074, 0.942523148148148, 0.9703125, 0.548518518518519, 0.548530092592593, 0.54880787037037, 0.54880787037037, 0.548819444444444, 0.548842592592593, 0.548865740740741, 0.548888888888889, 0.557337962962963, 0.6140625, 0.618761574074074, 0.618958333333333, 0.622303240740741), format = "h:m:s", class = "times"), ID = c("P1", "P1", "P1", "P1", "P1", "P1", "P5", "P5", "P5", "P5", "P5", "P5", "P5", "P5", "P5", "P9", "P9", "P9", "P9")), .Names = c("Date", "Time", "ID"), row.names = c(NA, -19L), class = "data.frame")
dput
– akrun