0
votes

I need help with grouping overlapping date-ranges in Microsoft SQL Server 18.1. A sample of the data looks like this. I need to be able to group based on ID, Name, StartDate and EndDate. This should be done so, that if a ID 1's date-range overlaps or is less than 7 days apart from the next row of ID 1's date range, they should be assigned the same grouping ID. If the timegap between two lines is bigger than 7 days, they should be seperated as two groupings.

The data is characterized by having differing Start- and EndDate for almost all of the rows, so it cannot be grouped by Start- and EndDate. Instead the goal is to group all rows for one person where the rows date-ranges overlap with less than 7 days, and show the Start- and EndDate for the full period by using MIN and MAX as shown in the "desired output window". Each Person in the dataset can have up to 200 lines of differing date-ranges that need to be grouped if it overlaps with other date-ranges for the person in the dataset.

I suppose the solution requires running through all rows until all rows are grouped based on ID, Name and overlapping date ranges. The case is that match-grouping for period 1-4, firstly requires period 1 to be matched with period 2, then period 1-2 needs to be matched to period 3, and period 1-3 to be matched with period 4. It can be the case that Period 1's date-range (e.g. 01-01-2019 - 30-05-2019) ends later than period 2's (e.g. 05-02-2019 - 24-04-2019), where the period 1-2 comparison/match to period 3 should be the MAX EndDate for period 1-2 meaning 30-05-2019 in this case.

Period 1    Period 2    Period 3    Period 4
X           
            X       
                        X   
                                    X

I need help with making a code that gets me from step 0 - Raw Data to step 1 - Grouping by overlapping date-ranges (less than 7 days apart). I have tried CASE, LAG, LEAD, PARTITION BY and some different kind of loops but haven't found a solution on how to solve the problem.

Step 0 - Raw Data:

ID  Name            StartDate   EndDate
1   Peter Hanson    01-01-2018  15-02-2019
1   Peter Hanson    05-01-2019  23-02-2019
1   Peter Hanson    30-02-2019  18-04-2019
2   Eric Schmidt    05-01-2019  18-03-2019
2   Eric Schmidt    07-01-2019  25-05-2019
3   Martin Boyle    08-03-2018  12-01-2019
3   Martin Boyle    15-01-2019  17-04-2019
3   Martin Boyle    18-04-2019  12-05-2019
3   Martin Boyle    29-04-2019  31-09-2019

Step 1- Grouping by overlapping date-ranges (less than 7 days apart):

ID  Name            StartDate   EndDate     Grouping
1   Peter Hanson    01-01-2018  15-02-2019  1
1   Peter Hanson    05-01-2019  23-02-2019  1
1   Peter Hanson    30-02-2019  18-04-2019  2
2   Eric Schmidt    05-01-2019  18-03-2019  3
2   Eric Schmidt    07-01-2019  25-05-2019  3
3   Martin Boyle    08-03-2018  12-01-2019  4
3   Martin Boyle    23-01-2019  17-04-2019  5
3   Martin Boyle    18-04-2019  12-05-2019  5
3   Martin Boyle    29-04-2019  31-09-2019  5

Step 2 - Desired Output window:

ID  Name            StartDate   EndDate     Grouping
1   Peter Hanson    01-01-2019  23-02-2019  1
1   Peter Hanson    30-02-2019  18-04-2019  2
2   Eric Schmidt    05-01-2019  25-05-2019  3
3   Martin Boyle    08-03-2018  12-01-2019  4
3   Martin Boyle    23-01-2019  31-09-2019  5

I hope that somebody can help with this task.

1

1 Answers

0
votes

You want to identify where a group starts. Based on your description, you can use lag() -- although it total overlaps are allowed then a cumulative max() is more appropriate.

Then, the groups are the cumulative sum of the starts . . . and the rest is aggregation:

select id, name, min(startdate), max(enddate),
       dense_rank() over (order by id, min(startdate)) as grouping
from (select t.*,
             sum(case when prev_enddate >= dateadd(day, -7, startdate) then 0 else 1 end /*end*/
                ) over (partition by id order by startdate) as grp
      from (select t.*,
                   lag(enddate) over (partition by id order by startdate) as prev_enddate
            from t
           ) t
     ) t
group by id, name;