I have data in the form of start and stop times (in the format minutes:seconds). A simplistic example might be the timestamp of a light turning on, and the subsequent timestamp of the light turning off.
For example:
Start Stop
00:03.1 00:40.9
00:55.0 01:38.2
01:40.0 02:01.1
I would like to rearrange the data so that I can eventually look at it in terms of whole-minute interval bins in R.
Option 1: Turn the data into a binary listing for each tenth of a second, then aggregate the data later by timestamp.
Time.in.sec Yes.or.No
0.0 N
0.1 N
... ...
3.0 N
3.1 Y
3.2 Y
... ...
40.8 Y
40.9 N
... ...
Option 2: Split the time intervals at the minute marks and aggregate total time per minute (starting at time = 0:00.0) with some sort of logical rule.
Start Stop
00:03.10 00:40.90
00:55.00 00:59.99
01:00.00 01:38.20
01:40.00 01:59.99
02:00.00 02:01.10
I have tried looking into lubridate functions (i.e., making each range into an interval class) and cut()
, but I can’t seem to figure out how to make either of these ideas work. I also am unclear whether packages such as zoo would be appropriate for this; honestly, I have very little experience with date/time formats and time series.
Other questions on Stackoverflow seem to be addressing making bins from raw timestamps (e.g., What is an efficient method for partitioning and aggregating intervals from timestamped rows in a data frame? and Aggregate data by equally spaced time intervals in R), but I essentially want to do the opposite.
EDIT 1: Here is a CSV-format of the example data, up through minute 6.
Start, Stop
00:03.1, 00:40.9
00:55.0, 01:38.2
01:40.0, 02:01.1
03:03.1, 04:30.3
04:50.0, 05:01.5
05:08.7, 05:22.0
05:40.1, 05:47.9
EDIT 2: My ultimate goal for this is to have the data in a format that I can use to chunk the observations into standardized time bins (Minute 1, Minute 2, etc.) to get a by-minute percentage of when the data is "Yes". Basically I want to get a summary of the distribution of states by minute, and since the data is binary, I can do this by looking at the "yes" state.
For the first 3 minutes (from 00:00.0 up until 03:00.0), the output would be something like this:
Minute time.yes.sec perc.time.yes
1 42.8 71.33
2 58.2 96.98
3 1.1 1.83
# *NOTE: Here, Minute 1 = [0, 60), Minute 2 = [60, 120), etc.; I'm not opposed
# to the reverse definitions though (Minute 1 = (0, 60], etc.).
I could alternatively look at the data as a cumulative distribution plot, with each successive time point updating the value of "total time yes". However, If I could get the data in the format of option 1, I would have the flexibility to look at the data either way.
library(tidyverse) ; df %>% mutate_all(funs(as.numeric(lubridate::ms(.)))) %>% rowwise() %>% mutate(instant = list(seq(Start, Stop, by = 0.1))) %>% unnest() %>% mutate(minute = cut(instant, (0:6) * 60)) %>% group_by(minute) %>% summarise(elapsed = (n()-1)/10)
– alistaire