I have some data that contains dates. I'm trying to group the data by consecutive dates, however, the dates are not exactly consecutive. Here is an example:
DateColumn | Value
------------------------+-------
2017-01-18 01:12:34.107 | 215426 <- batch no. 1
2017-01-18 01:12:34.113 | 215636
2017-01-18 01:12:34.623 | 123516
2017-01-18 01:12:34.633 | 289926
2017-01-18 04:58:42.660 | 259063 <- batch no. 2
2017-01-18 04:58:42.663 | 261830
2017-01-18 04:58:42.893 | 219835
2017-01-18 04:58:42.907 | 250165
2017-01-18 05:18:14.660 | 134253 <- batch no. 3
2017-01-18 05:18:14.663 | 134257
2017-01-18 05:18:14.667 | 134372
2017-01-18 05:18:15.040 | 181679
2017-01-18 05:18:15.043 | 226368
2017-01-18 05:18:15.043 | 227070
The data is generated in batches and each row inside a batch takes a few milliseconds to generate. I'm trying to group the results as follows:
Date1 | Date2 | Count
------------------------+-------------------------+------
2017-01-18 01:12:34.107 | 2017-01-18 01:12:34.633 | 4
2017-01-18 04:58:42.660 | 2017-01-18 04:58:42.907 | 4
2017-01-18 05:18:14.660 | 2017-01-18 05:18:15.043 | 6
It is safe to assume that if two consecutive rows are more than 1 minute apart then they belong to a different batch.
I tried solutions involving ROW_NUMBER
function but they work with consecutive dates (date difference between two rows is fixed). How can I achieve desired result when the difference is fuzzy?
Please note that a batch could be much longer than a minute. For example a batch might consist of rows starting from 2017-01-01 00:00:00 and ending at 2017-01-01 00:05:00 consisting of ~3000 rows and each row few dozen or hundred millisecond apart. What is for certain is that batches are at least 1 minute apart.
datetime
value. – Salman A