3
votes

I have the following query which works, binning timestamped "observations" into buckets whose boundaries are defined by the bins table:

SELECT
  count(id),
  width_bucket(
      time :: TIMESTAMP,
      (SELECT ARRAY(SELECT start_time
                    FROM bins
                    WHERE owner_id = 'some id'
                    ORDER BY start_time ASC) :: TIMESTAMP[])
  ) bucket
FROM observations
WHERE owner_id = 'some id'
GROUP BY bucket
ORDER BY bucket;

I would like to modify this to allow for querying arbitrary n-minute bins starting from a specified timestamp, rather than having to pull from from an actual "bins" table.

That is, given a start time, a "bin width" in minutes, and a number of bins, is there a way I can generate the array of timestamps to pass into the width_bucket function?

Alternatively, is there a different/simpler approach to get the same results?

3

3 Answers

3
votes

Use the function generate_series(start, stop, step interval), e.g.

select array(
    select generate_series(
        timestamp '2018-04-15 00:00', 
        '2018-04-15 01:00', 
        '30 minutes'))

                               array                                
---------------------------------------------------------------------
 {"2018-04-15 00:00:00","2018-04-15 00:30:00","2018-04-15 01:00:00"}
(1 row)

Example in Db<>fiddle.

1
votes

The above answers seem to do what you want, but as of PostgreSQL 14, there is now a function date_bin just for binning timestamps.

Quoting the documentation:

date_bin(stride,source,origin)

source is a value expression of type timestamp or timestamp with time zone. (Values of type date are cast automatically to timestamp.) stride is a value expression of type interval. The return value is likewise of type timestamp or timestamp with time zone, and it marks the beginning of the bin into which the source is placed.

Examples:

SELECT date_bin('15 minutes', TIMESTAMP '2020-02-11 15:44:17', TIMESTAMP > '2001-01-01');
Result: 2020-02-11 15:30:00

SELECT date_bin('15 minutes', TIMESTAMP '2020-02-11 15:44:17', TIMESTAMP '2001-01-01 00:02:30');
Result: 2020-02-11 15:32:30

In the case of full units (1 minute, 1 hour, etc.), it gives the same result as the analogous date_trunc call, but the difference is that date_bin can truncate to an arbitrary interval.

The stride interval must be greater than zero and cannot contain units of month or larger.

I would like to call special attention to the line

The return value [...] marks the beginning of the bin into which the source is placed.

This means that input timestamps will always be binned by "rounding down", rather than binning to whichever bin is closest. E.g. if you do:

SELECT date_bin('1 hour', '2021-10-13 00:59:59', '2021-10-13 00:00:00');

Then the result will be 2020-10-13 00:00:00 (rounded down by 59 minutes and 59 seconds), NOT 2021-10-13 01:00:00 (which is only one second away from the supplied timestamp). So the date_bin function does something slightly different than exactly what you ask for, but I figure this is good to post for anyone coming here in the future.

0
votes

A different approach without a series:

Divide the difference of time and start by the width of the bin (5 minutes in the example) and add 1 because the first bucket of width_bucket(...) is 1 not 0.

floor(extract(epoch from (time - '2019-06-04 00:00'::timestamp)) / (5 * 60) ) + 1 as bucket

Getting the start of the bin is also possible

to_timestamp(floor(extract(epoch from a.time) / (5 * 60)) * (5 * 60)) as bin_start

Putting this all together:

SELECT
  count(id),
  floor(extract(epoch from (time - '2019-06-04 00:00'::timestamp)) / (5 * 60) ) + 1 as bucket,
  to_timestamp(floor(extract(epoch from time) / (5 * 60)) * (5 * 60)) as bin_start
FROM observations
WHERE owner_id = 'some id'
GROUP BY bucket, bin_start
ORDER BY bucket;