In the following data set I have filtered JSON intervals by instances where the bike count is equal to zero. station_summary_id represents one time interval and increases by consecutive integers (in the example you see that 64129 is associated with "2014-10-01 07:00:00", then 64130 is associated with "2014-10-01 07:10:00" , and so on. station_id is the unique id of a station.
My objective is: to find the longest chain of consecutive integers by station_id -in other words - to find out the longest time period that each station was empty. I understand that this requires grouping first by station_id and then counting the longest consecutive sequence in station_summary_id but am not sure how to automate this for all station ids.
> dim(data)
[1] 307039 7
> head(data)
station_id status available_bike_count created_at station_summary_id month year
13694 2 Active 0 2014-10-01 07:00:00 64129 10 2014
13702 10 Active 0 2014-10-01 07:00:00 64129 10 2014
13706 14 Active 0 2014-10-01 07:00:00 64129 10 2014
13710 18 Active 0 2014-10-01 07:00:00 64129 10 2014
13713 21 Active 0 2014-10-01 07:00:00 64129 10 2014
13728 36 Active 0 2014-10-01 07:00:00 64129 10 2014
reproducible example:
> dput(dat)
structure(list(station_id = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L), status = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = "Active", class = "factor"), available_bike_count = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), station_summary_id = c(64129L,
64130L, 64131L, 64132L, 64133L, 64134L, 64136L, 64138L, 64139L,
64140L, 64141L, 64142L, 64143L, 64144L, 64145L, 64146L, 64147L,
64148L, 64149L, 64150L, 64152L, 64161L, 64162L, 64170L, 64273L,
64322L, 64324L, 64341L, 64884L, 64886L, 64896L, 64897L, 64898L,
64899L, 64900L, 64901L, 64902L, 64903L, 64904L, 64905L, 64906L,
64907L, 64908L, 64909L, 64910L, 64911L, 64912L, 64913L, 64917L,
64918L, 65214L, 65219L, 66314L, 66439L, 66450L, 66583L, 66587L,
66589L, 66600L, 66872L, 66880L, 67037L, 67048L, 82854L, 82855L,
82856L, 82857L, 82858L, 82859L, 82860L, 82861L, 82862L, 82863L,
82867L, 82868L), month = c(10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L), year = c(2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L)), .Names = c("station_id",
"status", "available_bike_count", "station_summary_id", "month",
"year"), row.names = c(NA, -75L), class = "data.frame")