Count rows in partition with Order By

Question

I was trying to understand PARTITION BY in postgres by writing a few sample queries. I have a test table on which I run my query.

id integer | num integer
___________|_____________
1          | 4 
2          | 4
3          | 5
4          | 6

When I run the following query, I get the output as I expected.

SELECT id, COUNT(id) OVER(PARTITION BY num) from test;

id         | count
___________|_____________
1          | 2 
2          | 2
3          | 1
4          | 1

But, when I add ORDER BY to the partition,

SELECT id, COUNT(id) OVER(PARTITION BY num ORDER BY id) from test;

id         | count
___________|_____________
1          | 1 
2          | 2
3          | 1
4          | 1

My understanding is that COUNT is computed across all rows that fall into a partition. Here, I have partitioned the rows by num. The number of rows in the partition is the same, with or without an ORDER BY clause. Why is there a difference in the outputs?

In the second the case, postgre counts the number of rows with id lower or equal to the actual id — Radim Bača
@RadimBača is it something specific to postgres or is that how it is supposed to work? I don't understand how the query is interpreted the way you have described. — kishore
See the documentation on window functions, especially: By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition. — Marth
Thanks for the info. I now understand what is happening. I missed the info provided in the documentation. — kishore

a_horse_with_no_name a_horse_with_no_name · Accepted Answer · 2018-06-20T10:18:46

When you add an order by to an aggregate used as a window function that aggregate turns into a "running count" (or whatever aggregate you use).

The count(*) will return the number of rows up until the "current one" based on the order specified.

The following query shows the different results for aggregates used with an order by. With sum() instead of count() it's a bit easier to see (in my opinion).

with test (id, num, x) as (
  values 
    (1, 4, 1),
    (2, 4, 1),
    (3, 5, 2),
    (4, 6, 2)
)
select id, 
       num,
       x,
       count(*) over () as total_rows, 
       count(*) over (order by id) as rows_upto,
       count(*) over (partition by x order by id) as rows_per_x,
       sum(num) over (partition by x) as total_for_x,
       sum(num) over (order by id) as sum_upto,
       sum(num) over (partition by x order by id) as sum_for_x_upto
from test;

will result in:

id | num | x | total_rows | rows_upto | rows_per_x | total_for_x | sum_upto | sum_for_x_upto
---+-----+---+------------+-----------+------------+-------------+----------+---------------
 1 |   4 | 1 |          4 |         1 |          1 |           8 |        4 |              4
 2 |   4 | 1 |          4 |         2 |          2 |           8 |        8 |              8
 3 |   5 | 2 |          4 |         3 |          1 |          11 |       13 |              5
 4 |   6 | 2 |          4 |         4 |          2 |          11 |       19 |             11

There are more examples in the Postgres manual

Count rows in partition with Order By

3 Answers