29
votes

I have the following table in my Postgresql 9.1 database:

select * from ro;
date       |  shop_id | amount 
-----------+----------+--------
2013-02-07 |     1001 |      3
2013-01-31 |     1001 |      2
2013-01-24 |     1001 |      1
2013-01-17 |     1001 |      5
2013-02-10 |     1001 |     10
2013-02-03 |     1001 |      4
2012-12-27 |     1001 |      6
2012-12-20 |     1001 |      8
2012-12-13 |     1001 |      4
2012-12-06 |     1001 |      3
2012-10-29 |     1001 |      3

I am trying to get a moving average comparing data against last 3 Thursdays without including the current Thursday. Here's my query:

select date, shop_id, amount, extract(dow from date),
avg(amount) OVER (PARTITION BY extract(dow from date) ORDER BY date DESC
                      ROWS BETWEEN 0 PRECEDING AND 2 FOLLOWING)                          
from ro
where extract(dow from date) = 4

This is the result given

date       |  shop_id | amount | date_part |        avg         
-----------+----------+--------+-----------+--------------------
2013-02-07 |     1001 |      3 |         4 | 2.0000000000000000
2013-01-31 |     1001 |      2 |         4 | 2.6666666666666667
2013-01-24 |     1001 |      1 |         4 | 4.0000000000000000
2013-01-17 |     1001 |      5 |         4 | 6.3333333333333333
2012-12-27 |     1001 |      6 |         4 | 6.0000000000000000
2012-12-20 |     1001 |      8 |         4 | 5.0000000000000000
2012-12-13 |     1001 |      4 |         4 | 3.5000000000000000
2012-12-06 |     1001 |      3 |         4 | 3.0000000000000000

I expect

date       |  shop_id | amount | date_part |        avg         
-----------+----------+--------+-----------+--------------------
2013-02-07 |     1001 |      3 |         4 | 2.6666666666666667
2013-01-31 |     1001 |      2 |         4 | 4.0000000000000000
2013-01-24 |     1001 |      1 |         4 | 6.3333333333333333
2013-01-17 |     1001 |      5 |         4 | 6.0000000000000000
2012-12-27 |     1001 |      6 |         4 | 5.0000000000000000
2012-12-20 |     1001 |      8 |         4 |
2012-12-13 |     1001 |      4 |         4 |
2012-12-06 |     1001 |      3 |         4 |
2
+1 good question - Pg version, sample data, expected results. Thanks! Converted to SQLFiddle here: sqlfiddle.com/#!1/18891/1Craig Ringer
BTW, "date" is a terrible column name, since it's the name of a data type. Avoid using it. If you must use it, always qualify it with the table alias and double quote it, as shown here: sqlfiddle.com/#!1/18891/4Craig Ringer
Thanks Craig :) This is just a sample dataset from a very large table. I just like to get the query right first.Glicious

2 Answers

19
votes

SQL Fiddle

select
    "date",
    shop_id,
    amount,
    extract(dow from date),
    case when
        row_number() over (order by date) > 3
        then
            avg(amount) OVER (
                ORDER BY date DESC
                ROWS BETWEEN 1 following AND 3 FOLLOWING
            )
        else null end
from (
    select *
    from ro
    where extract(dow from date) = 4
) s

What is wrong with the OP's query is the frame specification:

ROWS BETWEEN 0 PRECEDING AND 2 FOLLOWING

Other than that my query avoids unneeded computing by filtering Thursdays before applying the expensive window functions.

If it is necessary to partition by shop_id then obviously add the partition by shop_id to both functions, avg and row_number.

12
votes

I think a better answer might be:

SELECT date, shop_id, amount, 
    extract(dow from date) AS dow,
    CASE WHEN count(amount) OVER w = 3 
        THEN avg(amount) OVER w END AS average_amt             
FROM ro
WHERE extract(dow from date) = 4 
WINDOW w AS (ORDER BY date DESC ROWS BETWEEN 1 FOLLOWING AND 3 FOLLOWING)

I think it is cleaner to use the same window for both checking the number of rows in the window and taking the average. (This also saves two window aggregations, as can be seen in the original answer.)

Regarding the claim in the earlier answer that "my query avoids unneeded computing by filtering Thursdays before applying the expensive window functions", this also applies to the query suggested by the OP and to my query, as appending EXPLAIN to either shows.