35
votes

I have two tables, both with start time and end time fields. I need to find, for each row in the first table, all of the rows in the second table where the time intervals intersect.

For example:

           <-----row 1 interval------->
<---find this--> <--and this--> <--and this-->

Please phrase your answer in the form of a SQL WHERE-clause, AND consider the case where the end time in the second table may be NULL.

Target platform is SQL Server 2005, but solutions from other platforms may be of interest also.

5
Presumably, if the end time in the second table is NULL, it should be treated as 'the period does not have an end time and continues into the future'. That's what the accepted answer does; it is a common interpretation, but it isn't the only possible interpretation.Jonathan Leffler
Merge Overlapping Intervals algorithm can give some leads.RBT
@RBT: "Please phrase your answer in the form of a SQL WHERE-clause" :-)Steven A. Lowe

5 Answers

62
votes
SELECT * 
FROM table1,table2 
WHERE table2.start <= table1.end 
AND (table2.end IS NULL OR table2.end >= table1.start)
2
votes

"solutions from other platforms may be of interest also."

SQL Standard defines OVERLAPS predicate:

Specify a test for an overlap between two events.

<overlaps predicate> ::=
   <row value constructor 1>  OVERLAPS <row value constructor 2> 

Example:

SELECT 1
WHERE ('2020-03-01'::DATE, '2020-04-15'::DATE) OVERLAPS 
      ('2020-02-01'::DATE, '2020-03-15'::DATE) 
-- 1

db<>fiddle demo

1
votes
select * from table_1 
right join 
table_2 on 
(
table_1.start between table_2.start and table_2.[end]
or
table_1.[end] between table_2.start and table_2.[end]
or
(table_1.[end] > table_2.start and table_2.[end] is null)
)

EDIT: Ok, don't go for my solution, it perfoms like shit. The "where" solution is 14x faster. Oops...

Some statistics: running on a db with ~ 65000 records for both table 1 and 2 (no indexing), having intervals of 2 days between start and end for each row, running for 2 minutes in SQLSMSE (don't have the patience to wait)

Using join: 8356 rows in 2 minutes

Using where: 115436 rows in 2 minutes

1
votes

It's sound very complicated until you start working from reverse. Below I illustrated ONLY GOOD CASES (no overlaps)! defined by those 2 simple conditions, we have no overlap ranges if condA OR condB is TRUE, so we going to reverse those: NOT condA AND NOT CondB, in our case I just reversed signs (> became <=)

/*
|--------| A                             \___  CondA: b.ddStart >  a.ddEnd
            |=========| B                /      \____ CondB:  a.ddS >  b.ddE
                          |+++++++++| A         /
*/
--DROP TABLE ran
create table ran ( mem_nbr int, ID int, ddS date, ddE date)
insert ran values  
(100, 1,  '2012-1-1','2012-12-30'),    ----\ ovl
(100, 11, '2012-12-12','2012-12-24'),  ----/
(100, 2, '2012-12-31','2014-1-1'),
(100, 3, '2014-5-1','2014-12-14') ,

(220, 1, '2015-5-5','2015-12-14') ,    ---\ovl
(220, 22, '2014-4-1','2015-5-25') ,    ---/
(220, 3, '2016-6-1','2016-12-16')  

select  DISTINCT a.mem_nbr ,  a.* , '-' [ ], b.dds, b.dde, b.id 
FROM ran a
join ran b  on  a.mem_nbr = b.mem_nbr          -- match by mem#
               AND     a.ID <> b.ID            -- itself
                  AND     b.ddS <= a.ddE        -- NOT    b.ddS >  a.ddE       
                  AND     a.ddS <= b.ddE        -- NOT    a.ddS >  b.ddE   
0
votes

And what, if you want to analyse such an overlap on a minute precision with 70m+ rows? the only solution i could make up myself was a time dimension table for the join

else the dublicate-handling became a headache .. and the processing cost where astronomical