1
votes

I have a dataset where each row has a date range. I want to combine records into single date ranges if they overlap or there's a gap of less than 30 days and they share the same ID number. If it's more than 30 days, I want them to remain separate. I can figure out how to do it if they are overlapping, and I can figure out how to do it no matter the size of the gap, but I can't figure out how to do it with a limited gap allowance.

So, for example, if my data looks like this:

ID        Date1           Date2
ABC       2018-01-01      2018-02-14
ABC       2018-02-13      2018-03-17
ABC       2018-04-01      2018-07-24
DEF       2017-01-01      2017-06-30
DEF       2017-10-01      2017-12-01

I want it to come out like this:

ID        Date1           Date2
ABC       2018-01-01      2018-07-24
DEF       2017-01-01      2017-06-30
DEF       2017-10-01      2017-12-01

The three date ranges for ABC are combined, because they either overlap or the gaps are less than 30 days. The two date ranges for DEF stay separate, because the gap between them is larger than 30 days.

I'm using Microsoft SSMS.

1
Can you share the code you have for the two scenarios you have working? A light tweak to one of those will do this for you.Sean Lange

1 Answers

2
votes

You can identify where the new periods begin. For a general problem, I would go with not exists. Then you can assign a group using cumulative sums:

select id, sum(is_start) over (partition by id order by datestart) as grp
from (select t.*,
             (case when not exists (select 1
                                    from t t2
                                    where t2.id = t.id and
                                          t2.date1 >= dateadd(day, -30, t1.date1) and
                                          t2.date2 < dateadd(day, 30, t1.date2)
                                   )
                   then 1 else 0
              end) as is_start
      from t
     ) t;

The final step is aggregation:

with g as (
      select id, sum(is_start) over (partition by id order by datestart) as grp
      from (select t.*,
                   (case when not exists (select 1
                                          from t t2
                                          where t2.id = t.id and
                                                t2.date1 >= dateadd(day, -30, t1.date1) and
                                                t2.date2 < dateadd(day, 30, t1.date2)
                                         )
                         then 1 else 0
                    end) as is_start
            from t
           ) t
      )
select id, min(date1), max(date2)
from g
group by id, grp;