250
votes

I want to run this query:

SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
FROM purchases
WHERE purchases.product_id = 1
ORDER BY purchases.purchased_at DESC

But I get this error:

PG::Error: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions

Adding address_id as first ORDER BY expression silences the error, but I really don't want to add sorting over address_id. Is it possible to do without ordering by address_id?

7
Your order clause has purchased_at not address_id.Can you make your question clear.Teja
my order has purchase because i want it, but postgres also asks for address(see error message).sl_bug
Personally I think requiring DISTINCT ON to match ORDER BY is very questionable, as there are a variety of legitimate use cases for having them differ. There is a post on postgresql.uservoice trying to change this for those who feel similarly. postgresql.uservoice.com/forums/21853-general/suggestions/…semicolon
got the exact same issue, and facing the same limtation. At the moment I have broken it into a sub-query and then ordering, but it feels dirty.Guy Park

7 Answers

255
votes

Documentation says:

DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. [...] Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. [...] The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).

Official documentation

So you'll have to add the address_id to the order by.

Alternatively, if you're looking for the full row that contains the most recent purchased product for each address_id and that result sorted by purchased_at then you're trying to solve a greatest N per group problem which can be solved by the following approaches:

The general solution that should work in most DBMSs:

SELECT t1.* FROM purchases t1
JOIN (
    SELECT address_id, max(purchased_at) max_purchased_at
    FROM purchases
    WHERE product_id = 1
    GROUP BY address_id
) t2
ON t1.address_id = t2.address_id AND t1.purchased_at = t2.max_purchased_at
ORDER BY t1.purchased_at DESC

A more PostgreSQL-oriented solution based on @hkf's answer:

SELECT * FROM (
  SELECT DISTINCT ON (address_id) *
  FROM purchases 
  WHERE product_id = 1
  ORDER BY address_id, purchased_at DESC
) t
ORDER BY purchased_at DESC

Problem clarified, extended and solved here: Selecting rows ordered by some column and distinct on another

65
votes

You can order by address_id in an subquery, then order by what you want in an outer query.

SELECT * FROM 
    (SELECT DISTINCT ON (address_id) purchases.address_id, purchases.* 
    FROM "purchases" 
    WHERE "purchases"."product_id" = 1 ORDER BY address_id DESC ) 
ORDER BY purchased_at DESC
56
votes

A subquery can solve it:

SELECT *
FROM  (
    SELECT DISTINCT ON (address_id) *
    FROM   purchases
    WHERE  product_id = 1
    ) p
ORDER  BY purchased_at DESC;

Leading expressions in ORDER BY have to agree with columns in DISTINCT ON, so you can't order by different columns in the same SELECT.

Only use an additional ORDER BY in the subquery if you want to pick a particular row from each set:

SELECT *
FROM  (
    SELECT DISTINCT ON (address_id) *
    FROM   purchases
    WHERE  product_id = 1
    ORDER  BY address_id, purchased_at DESC  -- get "latest" row per address_id
    ) p
ORDER  BY purchased_at DESC;

If purchased_at can be NULL, use DESC NULLS LAST - and match your index for best performance. See:

Related, with more explanation:

11
votes

Window function may solve that in one pass:

SELECT DISTINCT ON (address_id) 
   LAST_VALUE(purchases.address_id) OVER wnd AS address_id
FROM "purchases"
WHERE "purchases"."product_id" = 1
WINDOW wnd AS (
   PARTITION BY address_id ORDER BY purchases.purchased_at DESC
   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
7
votes

For anyone using Flask-SQLAlchemy, this worked for me

from app import db
from app.models import Purchases
from sqlalchemy.orm import aliased
from sqlalchemy import desc

stmt = Purchases.query.distinct(Purchases.address_id).subquery('purchases')
alias = aliased(Purchases, stmt)
distinct = db.session.query(alias)
distinct.order_by(desc(alias.purchased_at))
0
votes

It can also be solved using the following query along with other answers.

WITH purchase_data AS (
        SELECT address_id, purchased_at, product_id,
                row_number() OVER (PARTITION BY address_id ORDER BY purchased_at DESC) AS row_number
        FROM purchases
        WHERE product_id = 1)
SELECT address_id, purchased_at, product_id
FROM purchase_data where row_number = 1
-2
votes

You can also done this by using group by clause

   SELECT purchases.address_id, purchases.* FROM "purchases"
    WHERE "purchases"."product_id" = 1 GROUP BY address_id,
purchases.purchased_at ORDER purchases.purchased_at DESC