1
votes

I want to generate a series of timestamp ranges over an interval, and query a table for all of the overlapping events. I am almost there, but still pretty new to SQL and cannot seem to get over this hump.

To test with, I build a table with a bunch of events, each taking place during a date time range:

CREATE TABLE event (id int, order_dates tsrange, flow int);
INSERT INTO event VALUES
    (1,'[2021-09-01 10:55:01,2021-09-04 15:16:01)',50),
    (2,'[2021-08-15 20:14:27,2021-08-18 22:19:27)',36),
    (3,'[2021-08-03 12:51:47,2021-08-05 11:28:47)',41),
    (4,'[2021-08-17 09:14:30,2021-08-20 13:57:30)',29),
    (5,'[2021-08-02 20:29:07,2021-08-04 19:19:07)',27),
    (6,'[2021-08-26 02:01:13,2021-08-26 08:01:13)',39),
    (7,'[2021-08-25 23:03:25,2021-08-27 03:22:25)',10),
    (8,'[2021-08-12 23:40:24,2021-08-15 08:32:24)',26),
    (9,'[2021-08-24 17:19:59,2021-08-29 00:48:59)',5),
    (10,'[2021-09-01 02:01:17,2021-09-02 12:31:17)',48),
    (11,'[2021-08-16 01:30:17,2021-08-16 01:46:17)',37),
    (12,'[2021-08-06 09:35:23,2021-08-10 09:19:23)',21),
    (13,'[2021-08-17 06:12:21,2021-08-20 04:40:21)',12),
    (14,'[2021-08-21 05:45:03,2021-08-23 16:24:03)',38),
    (15,'[2021-08-10 01:55:48,2021-08-10 23:23:48)',39),
    (16,'[2021-08-28 06:26:59,2021-08-29 21:25:59)',28);

I can use the following query to generate a range of timestamps at 60 minute (or any value) intervals and get all of the events that overlap (and get the count of events that take place at each timestamp). It works great:

SELECT row_number() OVER () AS id,
      grid.bin,
      count(DISTINCT t.id) AS id_count,
FROM (
   SELECT generate_series('2021-08-01 00:00:00'::timestamp,
                          '2021-08-07 23:59:59'::timestamp, interval '60 min') AS bin
   FROM event
   ) grid
LEFT JOIN event t ON t."order_dates" @> grid.bin
GROUP  BY grid.bin
ORDER  BY grid.bin;

https://www.db-fiddle.com/f/5P4G7DCY2vdZfLFSi5cA9p/0

I also know I can use this query to generate timestamp ranges with a given interval rather than just timestamps.

SELECT tsrange((lag(bin) OVER()), bin, '[)')
FROM generate_series(
    '2021-08-01 12:00:00'::timestamp,
    '2021-08-07 12:00:00',
    '60 minute')
AS a OFFSET 1;

https://www.db-fiddle.com/f/nPKTb82SknB3XYi5exrtkz/1

But I cannot seem to figure out how to combine these two queries, swapping out the generate_series portion of the first query with the second query. I do know I'll need to use && (overlap) rather than @> (contains element).

Any help with figuring out how to approach this will be appreciated. It's probably something simple, but I'm not sure what to even search for in solving something like this.

1

1 Answers

2
votes

Use generate_series in the FROM clause:

SELECT row_number() OVER (ORDER BY lower(grid.bin)) AS id,  -- needs ORDER BY
       grid.bin,
       count(t.id) AS id_count  -- no need for DISTINCT
FROM (SELECT tsrange(
                lag(t) OVER (ORDER BY t),
                t
             )
      FROM generate_series(
              '2021-08-01 00:00:00'::timestamp,
              '2021-08-07 23:59:59'::timestamp,
              interval '60 min'
           ) AS times(t)
     ) AS grid(bin)
LEFT JOIN event t ON t."order_dates" @> grid.bin
GROUP BY grid.bin
ORDER BY lower(grid.bin);