How to fix wrong most repetitive value in PostgreSQL

Question

I have a query which selects the most repetitive value from my_table. The query is below:

SELECT
gid,
    max_height
    FROM
    (
    SELECT gid, max_height,
    ROW_NUMBER() OVER (PARTITION BY gid ORDER BY freq DESC) AS rn
    FROM (
            SELECT gid, max_height, COUNT(id) AS freq
            FROM my_table
            GROUP BY 1, 2
    order by 1,2
    ) hgt_freq
    ) ranked_hgt_req
WHERE rn = 1

while, my_table contains three columns like:

gid id  max_height
3   1   19.3
3   2   19.3
3   3   20.3
3   4   20.3
3   5   19.3
3   6   19.3
3   7   21.4
3   8   21.4
3   9   21.4
3   10  21.4
3   11  21.4
3   12  21.4
22  1   23.1
22  2   23.1
22  3   23.1
22  4   23.1
22  5   23.1
22  6   23.1
22  7   22.1
22  8   22.1
22  9   22.1
22  10  22.1
22  11  22.1
22  12  22.1
29  1   24
29  2   24
29  3   24
29  4   18.9
29  5   18.9
29  6   18.9
29  7   NULL
29  8   NULL
29  9   27.1
29  10  27.1
29  11  6.5
29  12  6.5

The problem with this query is that it returns the most repetitive value in descending order which is giving wrong value for the case of gid = 22. The output of query is:

gid    max_height
3      21.4
22     22.1
29     24.0

For the case of gid = 22, there are two most repetitive values i.e., 23.1 and 22.1. Thus query should return 23.1. Can anyone point me how to fix this problem or is there a better approach to do this? The process needs to automated for large records (gids).

klin klin · Accepted Answer · 2017-06-09T21:52:25

Use distinct on:

select distinct on(gid) gid, max_height
from (
    select gid, max_height, count(id) as freq
    from my_table
    group by 1, 2
    ) s
order by gid, freq desc

 gid | max_height 
-----+------------
   3 |       21.4
  22 |       23.1
  29 |         24
(3 rows)

From the documentation:

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.

There are two most frequent values for gid=29. In cases like this you can choose in which order they should be presented by adding one more condition in order by:

select distinct on(gid) gid, max_height
from (
    select gid, max_height, count(id) as freq
    from my_table
    group by 1, 2
    ) s
order by gid, freq desc, max_height desc;

 gid | max_height 
-----+------------
   3 |       21.4
  22 |       23.1
  29 |         24
(3 rows)    

select distinct on(gid) gid, max_height
from (
    select gid, max_height, count(id) as freq
    from my_table
    group by 1, 2
    ) s
order by gid, freq desc, max_height;

 gid | max_height 
-----+------------
   3 |       21.4
  22 |       22.1
  29 |       18.9
(3 rows)

How to fix wrong most repetitive value in PostgreSQL

1 Answers