0
votes

I would like to aggregate the data based on the front and end of them. Here is the data-set.

                  time      name
1  2019-07-07 15:01:00 SENSOR_07
2  2019-07-07 15:01:10 SENSOR_07
3  2019-07-07 15:01:20 SENSOR_07
4  2019-07-07 15:01:30 SENSOR_07
5  2019-07-07 15:01:40 SENSOR_07
6  2019-07-07 15:01:50 SENSOR_05
7  2019-07-07 15:02:00 SENSOR_05
8  2019-07-07 15:02:10 SENSOR_05
9  2019-07-07 15:02:20 SENSOR_07
10 2019-07-07 15:02:30 SENSOR_05
11 2019-07-07 15:02:40 SENSOR_07
12 2019-07-07 15:02:50 SENSOR_07
13 2019-07-07 15:03:00 SENSOR_07
14 2019-07-07 15:03:10 SENSOR_07
15 2019-07-07 15:03:20 SENSOR_07

structure(list(time = structure(1:15, .Label = c("2019-07-07 15:01:00", "2019-07-07 15:01:10", "2019-07-07 15:01:20", "2019-07-07 15:01:30", "2019-07-07 15:01:40", "2019-07-07 15:01:50", "2019-07-07 15:02:00", "2019-07-07 15:02:10", "2019-07-07 15:02:20", "2019-07-07 15:02:30", "2019-07-07 15:02:40", "2019-07-07 15:02:50", "2019-07-07 15:03:00", "2019-07-07 15:03:10", "2019-07-07 15:03:20"), class = "factor"), name = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("SENSOR_05", "SENSOR_07"), class = "factor")), class = "data.frame", row.names = c(NA, -15L))

The sensor name is assigned at every 10 seconds. In case of 1-5 row, it can be aggregated to the first row in expected table (SENSOR_07 between 15:01:00 to 15:01:50 just before SENSOR-05 existence). Here is the expected table.

           min_time                max_time          name
2019-07-07 15:01:00     2019-07-07 15:01:50     SENSOR_07
2019-07-07 15:01:50     2019-07-07 15:02:20     SENSOR_05
2019-07-07 15:02:20     2019-07-07 15:02:30     SENSOR_07
2019-07-07 15:02:30     2019-07-07 15:02:40     SENSOR_05
2019-07-07 15:02:40     2019-07-07 15:03:30     SENSOR_07

How do I make that using R code (using dplyr)?

1

1 Answers

1
votes

We can create groups using lag and cumsum and then select first and last value of time

library(dplyr)

df %>%
   group_by(group = cumsum(name != lag(name, default = first(name)))) %>%
   summarise(min_time = first(time), 
             max_time = last(time), 
             name = first(name)) %>%
   select(-group)

#  min_time            max_time            name     
#  <fct>               <fct>               <fct>    
#1 2019-07-07 15:01:00 2019-07-07 15:01:40 SENSOR_07
#2 2019-07-07 15:01:50 2019-07-07 15:02:10 SENSOR_05
#3 2019-07-07 15:02:20 2019-07-07 15:02:20 SENSOR_07
#4 2019-07-07 15:02:30 2019-07-07 15:02:30 SENSOR_05
#5 2019-07-07 15:02:40 2019-07-07 15:03:20 SENSOR_07

However, it looks like you want to select max and min time for each sensor but your time column is factor. We can convert it into POSIXct class and then select max and min value from each group. Also if you want to have continuous values like end of one group should be start of another we can use lead to change value of max_time.

df %>%
   mutate(time = as.POSIXct(time)) %>%
   group_by(group = cumsum(name != lag(name, default = first(name)))) %>%
   summarise(min_time = min(time), 
             max_time = max(time), 
             name = first(name)) %>%
    mutate(max_time = lead(min_time, default = last(min_time))) %>%
    select(-group)

#  min_time            max_time            name     
#  <dttm>              <dttm>              <fct>    
#1 2019-07-07 15:01:00 2019-07-07 15:01:50 SENSOR_07
#2 2019-07-07 15:01:50 2019-07-07 15:02:20 SENSOR_05
#3 2019-07-07 15:02:20 2019-07-07 15:02:30 SENSOR_07
#4 2019-07-07 15:02:30 2019-07-07 15:02:40 SENSOR_05
#5 2019-07-07 15:02:40 2019-07-07 15:02:40 SENSOR_07