0
votes

This very simple SQL can calculate averages, medians etc. for a well defined periods, like year, month, quarter, week, day:

SELECT
  date_trunc('year', t.time2), -- or hour, day, week, month, year
  count(1), 
  percentile_cont(0.25) within group (order by t.price) as Q1,
  percentile_cont(0.5) within group (order by t.price) as Q2,
  percentile_cont(0.75) within group (order by t.price) as Q3,
  avg(t.price) as A,
  min(t.price) as Mi,
  max(t.price) as Mx

FROM my_table AS t
GROUP BY 1
ORDER BY date_trunc

The table contains list of individual transactions with date (timestamp) and price (bigint).

However, I am struggling to adapt it to calculate running/ moving values (eg. 4 weeks, or 6 months, or 2 quarters, or 12 months). How can this be achieved?

EDIT This is how data looks like:

enter image description here

And this is expected result:

enter image description here

EDIT 2:

Another issue I encounter is that there should be a full set of data included in moving avg, median, etc. calculations.

What I mean is that if data series starts in Jan-2000, then the first '12 month moving avg' that is meaningful can only be calculated in Dec-2000 (ie. first month that contains full 12 months of data). In case of 3 months moving avg, the first meaningful value will be in Mar-2000, etc.

So, I am thinking, that the logic for this query should be to:

1) determine start and end date to use in calculating avg, median etc stats , then

2) Loop over avg, median, etc. calculations FOR EACH start-end date pair.

To illustrate, the first part could be:

WITH range_values AS ( -- get min and max values for the data series
  SELECT date_trunc('month', min(time2)) as minval,
         date_trunc('month', max(time2)) as maxval
  FROM my_table),
period_range(d) AS ( -- generate complete list of periods eg. weeks, months, years for the data series 
  SELECT generate_series(minval, maxval, '1 month'::interval) as timeint
  FROM range_values
),
lookup_range AS ( -- generate start-end date pairs based on the data series
        select d as enddate, d- interval '11month' as startdate
from period_range
)
SELECT startdate, enddate
from lookup_range, range_values as p
where enddate  >= p.minval + interval '11month'; -- clip date range to calculate 12 months avg using 12 months of data only

The second part could be (not a valid query but just to illustrate the logic):

SELECT
  count(1),
  percentile_cont(0.5) within group (order by t.price) as median_price,
  avg(t.price) as avg_price
FROM my_table as t, lookup_range as l
WHERE t.time2>= 'startdate' AND t.time2 < 'enddate'  

Now, the challenge is how to combine the two? And how make it working with the minimum lines of code?

2
could you please put some data sample that you have in and corresponding data that you want out?Jaisus
You can use a window function using a frame definition similar to range between '1 month' preceding and current row but without some more sample data and expected output this is hard to tella_horse_with_no_name
Further info added. My understanding is that to use 'range between...' option data have to be aggregated first, eg sum of items per month. Then it can be added (eg 'add 12 rows to calculate total') but this would not work for medians?edaus

2 Answers

4
votes

I'd first aggregate by month an then calculate the moving average:

SELECT mon,
       sum(s_price) OVER w / sum(c_price) OVER w
FROM (SELECT date_trunc('month', time2::timestamp) AS mon,
             sum(price) AS s_price,
             count(price) AS c_prize
      FROM my_table
      GROUP BY date_trunc('month', time2::timestamp)) AS q
WINDOW w AS (ORDER BY mon
             RANGE BETWEEN '6 months'::interval PRECEDING
                       AND '6 months'::interval FOLLOWING);
0
votes

If anybody is looking for a solution to calculate 1,2,3,4,..6,...12 years/quarters/months/weeks/days/hours moving average, median, percentiles, etc. summary stats in a single go, here is the answer:

WITH grid AS (
      SELECT end_time, start_time
      FROM (

            SELECT end_time
          , lag(end_time, 12, 'infinity') OVER (ORDER BY end_time) AS start_time
            FROM (

                SELECT
                generate_series(date_trunc('month', min(time2))
              , date_trunc('month', max(time2)) + interval '1 month', interval '1 month') AS end_time
                FROM   my_table

                ) sub

           ) sub2

      WHERE end_time > start_time

)

SELECT
    to_char(date_trunc('month',a.end_time - interval '1 month'), 'YYYY-MM') as d
  , count(e.time2)
  , percentile_cont(0.25) within group (order by e.price) as Q1
  , percentile_cont(0.5) within group (order by e.price) as median
  , percentile_cont(0.75) within group (order by e.price) as Q3
  , avg(e.price) as Aver
  , min(e.price) as Mi
  , max(e.price) as Mx

FROM grid a

LEFT JOIN my_table e ON e.time2 >= a.start_time

                   AND e.time2 <  a.end_time

GROUP  BY end_time
ORDER  BY d DESC

Note, the table contains a list of individual time records (like sales transactions, etc), as in the example presented in the actual question.

And this bit:

to_char(date_trunc('month',a.end_time - interval '1 month'), 'YYYY-MM') as d

is only for display purposes. That is, the convention in PostgreSQL is that "end of the month" is actually "0 hour" of the next month (.ie. end of Oct 2019 is "2019.11.01 at 00:00:00"). The same applies to any time range (e.g. end of 2019 is actually "2020.01.01 at 00:00:00"). So, if "- interval '1 month' " is not included, the 12 months moving stats ending October 2019 will be shown as "for" 1st November 2019 at 00:00:00 (truncated to 2019-11).