Count distinct minutes for overlapping time sessions in Snowflake

Question

I have table like the following:

person	Session	session_start	session_end	half_hour_start	half_hour_end
A	A001	9/13/2020 7:58:00 PM	9/13/2020 8:10:00 PM	9/13/2020 8:00:00 PM	9/13/2020 8:30:00 PM
A	A002	9/13/2020 8:02:00 PM	9/13/2020 8:13:00 PM	9/13/2020 8:00:00 PM	9/13/2020 8:30:00 PM
A	A003	9/13/2020 8:27:00 PM	9/13/2020 8:28:00 PM	9/13/2020 8:00:00 PM	9/13/2020 8:30:00 PM
B	B001	9/13/2020 8:20:00 PM	9/13/2020 8:28:00 PM	9/13/2020 8:00:00 PM	9/13/2020 8:30:00 PM
B	B002	9/13/2020 8:28:00 PM	9/13/2020 8:43:00 PM	9/13/2020 8:00:00 PM	9/13/2020 8:30:00 PM

The goal is to count distinct minutes per every person for all session within 30 min block (half_hour_start - half_hour_end). Count starts from minute 00, ends by minute 29 (so there're 30 distinct minutes in total).

So that even in case if a person had session starting at 9/13/2020 8:00:01 PM and ending at 9/13/2020 8:00:05 PM, this person will still get credit for 1 minute - minute '00'. We're interested not in count of full minutes, but in count of all distinct minutes where session had place, even partially.

I need to get results like:

---old version---

person	distinct_minutes_count
A	14
B	10

---new version---

person	distinct_minutes_count
A	16
B	10

(which could be coming from:

person	Session	session_start	session_end	half_hour_start	half_hour_end	distinct_minutes_count_per_person
A	A001	9/13/2020 7:58:00 PM	9/13/2020 8:10:00 PM	9/13/2020 8:00:00 PM	9/13/2020 8:30:00 PM	16
A	A002	9/13/2020 8:02:00 PM	9/13/2020 8:13:00 PM	9/13/2020 8:00:00 PM	9/13/2020 8:30:00 PM	16
A	A003	9/13/2020 8:27:00 PM	9/13/2020 8:28:00 PM	9/13/2020 8:00:00 PM	9/13/2020 8:30:00 PM	16
B	B001	9/13/2020 8:20:00 PM	9/13/2020 8:28:00 PM	9/13/2020 8:00:00 PM	9/13/2020 8:30:00 PM	10
B	B002	9/13/2020 8:28:00 PM	9/13/2020 8:43:00 PM	9/13/2020 8:00:00 PM	9/13/2020 8:30:00 PM	10

)

The intermediate steps needed, probably, are:

person	Session	session_start	session_end	half_hour_start	half_hour_end	distinct_minute_per_session	distinct_minutes_count_per_session	distinct_minute_per_person	distinct_minutes_count
A	A001	9/13/2020 7:58:00 PM	9/13/2020 8:10:00 PM	9/13/2020 8:00:00 PM	9/13/2020 8:30:00 PM	00,01,02,03,04,05,06,07,08,09,10	11	00,01,02,03,04,05,06,07,08,09,10,11,12,13,27,28	16
A	A002	9/13/2020 8:02:00 PM	9/13/2020 8:13:00 PM	9/13/2020 8:00:00 PM	9/13/2020 8:30:00 PM	02,03,04,05,06,07,08,09,10,11,12,13	12	00,01,02,03,04,05,06,07,08,09,10,11,12,13,27,28	16
A	A003	9/13/2020 8:27:00 PM	9/13/2020 8:28:00 PM	9/13/2020 8:00:00 PM	9/13/2020 8:30:00 PM	27,28	2	00,01,02,03,04,05,06,07,08,09,10,11,12,13,27,28	16
B	B001	9/13/2020 8:20:00 PM	9/13/2020 8:28:00 PM	9/13/2020 8:00:00 PM	9/13/2020 8:30:00 PM	20,21,22,23,24,25,26,27,28	9	20,21,22,23,24,25,26,27,28,29	10
B	B002	9/13/2020 8:28:00 PM	9/13/2020 8:43:00 PM	9/13/2020 8:00:00 PM	9/13/2020 8:30:00 PM	28,29	2	20,21,22,23,24,25,26,27,28,29	10

But I don't see options of creating list values for a column in Snowflake.

Hi - shouldn't the distinct_minutes for A be 13, not 14? 10 for A001 and then 3 for the non-overlapping part of A002? I'm also not clear why you are keeping the Session in your resultset as the minutes apply to the user not the session i.e. there aren't 14 (or, rather, 13) distinct minutes for session A001 - there are either 10 (if A001 has priority over A002) or 2 (if A002 has priority over A001). Please could you explain in more detail what you are doing here? — NickW
Hi @NickW! thank you for commenting. The count of minutes is correct - we assign the following indexes - minute 00 [1], minute 01 [2], etc, so that if a person started with minute 00 and ended at minute 10, (s)he had 00-10 minutes, 11 in total. that's why we cut our 30 min block at minute 29, not 30. I added intermediate count of distinct minutes for sessions for convenience. No session has priority over another - we're interested in total results per person, not per session. — user14276110

Felipe Hoffa Felipe Hoffa · Accepted Answer · 2021-02-12T08:36:09

To generate multiple rows, you can use a JS table UDF:

CREATE OR REPLACE FUNCTION generate_minutes(STARTING timestamp, ENDING timestamp)
RETURNS TABLE (V VARCHAR)
LANGUAGE JAVASCRIPT
AS '{
    processRow: function get_params(row, rowWriter, context){
       for(var i=row.STARTING/60/1000; i<=row.ENDING/60/1000; i++) {
           rowWriter.writeRow({V: i}); 
       }
    }
}';


with data as (
select 'A' person, 'A001' session,  '9/13/2020 7:58:00'::timestamp session_start,   '9/13/2020 8:10:00'::timestamp session_end, '9/13/2020 8:00:00'::timestamp half_hour_start, '9/13/2020 8:30:00'::timestamp half_hour_end 
union all select 'A', 'A002', '9/13/2020 8:02:00', '9/13/2020 8:13:00', '9/13/2020 8:00:00', '9/13/2020 8:30:00'
union all select 'A', 'A003', '9/13/2020 8:27:00', '9/13/2020 8:28:00', '9/13/2020 8:00:00', '9/13/2020 8:30:00'
)

select person, count(distinct x.v) distinct_minutes
from data
  , table(generate_minutes(
      greatest(session_start, half_hour_start)
      , least(session_end, timestampadd(minute, -1, half_hour_end)))
     ) x
where session_start<=half_hour_end and session_end >= half_hour_start
group by person

Count distinct minutes for overlapping time sessions in Snowflake

3 Answers