Extract data by minimum time interval between observations in a time series

Question

I have a very long set of data collected from animal transmitters. Due to variable recharge of the tranmsitter's solar batteries, the interval between data points is highly variable (ranging from 180 seconds up to over one hour). I want to subset the data so the interval between points is a minimum of 10 minutes, or 600 seconds.

Here is what a small subset of my data looks like:

    datetime             id
    01/09/2015 14:10:54  A
    01/09/2015 14:26:56  A
    01/09/2015 14:41:28  A
    01/09/2015 14:43:53  A
    01/09/2015 14:46:37  A
    01/09/2015 14:48:57  A
    01/09/2015 14:51:31  A
    01/09/2015 14:54:08  A
    04/09/2015 14:37:07  B
    04/09/2015 14:52:07  B
    04/09/2015 15:07:04  B
    04/09/2015 15:15:35  B
    04/09/2015 15:18:00  B
    04/09/2015 15:20:23  B
    04/09/2015 15:22:49  B
    04/09/2015 15:25:12  B 
    04/09/2015 15:28:52  B

My desired output with a minimum interval of 10 minutes would be:

    datetime             id
    01/09/2015 14:10:54  A
    01/09/2015 14:26:56  A
    01/09/2015 14:41:28  A
    01/09/2015 14:51:31  A
    01/09/2015 14:37:07  B
    04/09/2015 14:52:07  B
    04/09/2015 15:07:04  B
    04/09/2015 15:18:00  B 
    04/09/2015 15:28:52  B

I found an almost exact question with an answer here. Their data included id, date and time. Here is the code given in the answer:

    library(dplyr)
    library(lubridate)

    locdata %>% 
       mutate(timestamp = dmy_hm(paste(date, time))) %>%
       group_by(id, date) %>%
       mutate(delta = timestamp - first(timestamp),
       steps = as.numeric(floor(delta / 3600)), 
       change = ifelse(is.na(steps - lag(steps)), 1, steps - lag(steps))) %>%
          filter(change > 0) %>%
          select(id, date, timestamp)

I adapted this slightly to my data as below:

    result <- mydata %>%
                group_by(id) %>%                          
                mutate(delta = datetime - first(datetime),        
                steps = as.numeric(floor(delta / 600)),
                change = ifelse(is.na(steps - lag(steps)), 1, steps - lag(steps)))

The code results in this output:

    datetime             id    delta   steps   change
    01/09/2015 14:10:54  A     0        0      1
    01/09/2015 14:26:56  A     962      1      1
    01/09/2015 14:41:28  A     1834     3      2
    01/09/2015 14:51:31  A     2437     4      1
    04/09/2015 14:37:07  B     0        0      1
    04/09/2015 14:52:07  B     900      1      1
    04/09/2015 15:07:04  B     1797     2      1
    04/09/2015 15:15:35  B     2308     3      1
    04/09/2015 15:18:00  B     2453     4      1
    04/09/2015 15:22:29  B     3105     5      1

The output gives the first data point in each 10 minute time block starting at time zero (per id). This is not exactly what I need, as some of the time points are less than 10 mins apart. What I need is the next time that is 10 mins or more after the previous point within each id.

Any idea how I could do this? Would I need to use a loop? Thanks for any ideas.

Can you add an example of your data frame? dput(locdata[1:30, ]) will do. — Istrel
Thanks for the response. Here is the 17 rows of data i included originally; "","datetime","id" "1",2015-09-01 14:10:54,"A" "2",2015-09-01 14:26:56,"A" "3",2015-09-01 14:41:28,"A" "4",2015-09-01 14:43:53,"A" "5",2015-09-01 14:46:37,"A" "6",2015-09-01 14:48:57,"A" "7",2015-09-01 14:51:31,"A" "8",2015-09-01 14:54:08,"A" "9",2015-09-04 14:37:07,"B" "10",2015-09-04 14:52:07,"B" "11",2015-09-04 15:07:04,"B" "12",2015-09-04 15:15:35,"B" "13",2015-09-04 15:18:00,"B" "14",2015-09-04 15:20:23,"B" "15",2015-09-04 15:22:49,"B" "16",2015-09-04 15:25:12,"B" "17",2015-09-04 15:28:52,"B" — KazRiri
@Istrel is this what you meant? There are only 17 rows here but I can add more if need be. Thanks again, and see my comments on the response below. — KazRiri

Jrmie Jrmie · Accepted Answer · 2019-10-21T10:14:17

I had this idea using dplyr (with lead() function) and a while loop, it mights help you

library(dplyr)
library(lubridate)

data <- data %>%
  mutate(date = lubridate::ymd_hms(datetime),
         id_rows = 1:nrow(.)) %>%
  group_by(id) %>%
  mutate(delta = lubridate::time_length(lag(date) %--% date, unit = "sec"))

while (min(data$delta, na.rm = T) < 600) {
  rm_rows <- data %>%
    filter(delta < 600) %>%
    filter(date == min(date)) %>%
    pull(id_rows)
  data <- data %>%
    filter(!id_rows %in% rm_rows) %>%
    mutate(delta = lubridate::time_length(lag(date) %--% date, unit = "sec"))
}

Extract data by minimum time interval between observations in a time series

1 Answers