0
votes

I have an event log in the format below.

Original format
I have created groups by DATE and ID using dplyr,hence a change in either date or ID will be taken as a different group.

I want to have only events that are >= 5secs time interval and remove the rest. Desired output

I have used dplyr and time lag to acheive this,since I am not able to dynamically assign a lag interval for this. But my current code checks for one lag interval and I end up deleting more rows than desired.Current output - all rows in yellow are removed. Ideally I would want "13:10:22", "13:10:24" in group 2 to be retained since the time lag from "13:10:17" to these times is 5 secs and more.

I am using "chron" to handle the times. I understand the time lag logic will not work in mycase. Could there be a better alternative apart from using an expensive for/if loop.

Code I have used

data$Date <- as.Date(data$Date,format = "%m/%d/%Y")  
data$Time <- chron(times = data$Time)  

data <- data  %>% arrange(Date,Time,ID)    
data$Group <- data %>%  group_by(Date,ID) %>% group_indices    
data <- data %>%     
        group_by(Group)  %>%       
        mutate(time.difference = Time - lag(Time)) %>%    
        filter(time.difference >= 0.00005787 | is.na(time.difference))  

Dput of the data

structure(list(Date = structure(c(17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17469, 17470, 17470, 17470, 17470), class = "Date"), Time = structure(c(0.936400462962963, 0.9425, 0.9425, 0.942511574074074, 0.942523148148148, 0.9703125, 0.548518518518519, 0.548530092592593, 0.54880787037037, 0.54880787037037, 0.548819444444444, 0.548842592592593, 0.548865740740741, 0.548888888888889, 0.557337962962963, 0.6140625, 0.618761574074074, 0.618958333333333, 0.622303240740741), format = "h:m:s", class = "times"), ID = c("P1", "P1", "P1", "P1", "P1", "P1", "P5", "P5", "P5", "P5", "P5", "P5", "P5", "P5", "P5", "P9", "P9", "P9", "P9")), .Names = c("Date", "Time", "ID"), row.names = c(NA, -19L), class = "data.frame")

2
Please show a small reproducible example with dputakrun
@akrun added the data.Vinds

2 Answers

1
votes
library(dplyr)
data %>%
  group_by(Group) %>%
  arrange(Group, Date, Time) %>% 
  filter((Time - lag(Time)) >= 5.787037e-05 | row_number() == 1L)
1
votes
data$datetime <- as.POSIXct(paste(data$Date, data$Time), format="%m/%d/%Y %H:%M:%S")  
data$group <-  data %>% group_by(ID,by5sec=cut(datetime, breaks="5 sec")) %>%  group_indices
data_filter <- data %>% group_by(group) %>% filter(row_number()==1)

I did this in 2 steps since I wanted the intermediate results with the group indices to be written onto a CSV.