1
votes

Table - col_pk, col1, col2,col3, col4, col_date_updated

This table has some rows with duplicate column values for col2 and col3. I want to keep those rows with col_date_updated is latest(max).

Eg:

col_pk, col1, col2,  col3, col4,   col_date_updated
1,      A,    hello, now,  200.00, 2017-12-12 15:09:44.437546
2,      B,    hello, now,  490.00, 2017-12-12 15:09:42.437065
3,      C,    hi,    now,  300.00, 2017-12-12 15:09:41.436617
4,      D,    hello, now,  250.00, 2017-12-12 15:09:45.436617
5,      E,    hi,    now,  250.00, 2017-12-12 10:09:41.436617

Expected Result:

col_pk, col1, col2,  col3, col4,   col_date_updated
3,      C,    hi,    now,  300.00, 2017-12-12 15:09:41.436617
4,      D,    hello, now,  250.00, 2017-12-12 15:09:45.436617
5
Do you want a query returning that result, or do you want to permanently delete (remove) those rows from the table?a_horse_with_no_name

5 Answers

1
votes

Check this.

SELECT DISTINCT ON (col2, col3) t.*
FROM table t
ORDER BY col_date_updated DESC

apply distinct on col2 and col3 cause you want them unique and keep the latest with order by desc

0
votes

If you just want to select to get your expected output, then ROW_NUMBER comes in handy:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY col2, col3
        ORDER BY col_date_updated DESC) rn
    FROM yourTable
)
SELECT col_pk, col1, col2,  col3, col4, col_date_updated
FROM cte
WHERE rn = 1;

If you instead want to delete the other records, then we can also reuse the CTE:

DELETE FROM yourTable WHERE col_pk IN (SELECT col_pk FROM cte WHERE rn > 1);
0
votes

You could try something like this.

SELECT t.*
  FROM yourtable t
 WHERE col_date_updated IN (SELECT MAX (col_date_updated)
                    FROM yourtable i
                   WHERE t.col2 = i.col2 AND t.col3 = i.col3);

So, If you wish to delete other records, you may use this.

DELETE 
  FROM yourtable t
 WHERE col_date_updated NOT IN (SELECT MAX (col_date_updated)
                    FROM yourtable i
                   WHERE t.col2 = i.col2 AND t.col3 = i.col3);

DEMO

0
votes

If you want to suppress all but the most recent rows for any {col2,col3}:


SELECT *
FROM thetable zt
WHERE NOT EXISTS (
        -- If a record exists with the same col2,col3,
        -- but a more recent date than zt.col_date_updated
        -- then zt.* cannot be the most recent one
        SELECT *
        FROM thetable nx
        WHERE nx.col2 = zt.col2 -- same value
        AND nx.col3 = zt.col3   -- same value
        AND nx.col_date_updated > zt.col_date_updated -- more recent
        );

If you want to physically delete all but the most recent rows for the same {col2,col3}:


DELETE
FROM thetable zt
WHERE EXISTS (
        -- If a record exists with the same col2,col3,
        -- but a more recent date than zt.t.col_date_updated
        -- then zt.* cannot be the most recent one
        -- and we can delete zt.
        SELECT *
        FROM thetable nx
        WHERE nx.col2 = zt.col2 -- same value
        AND nx.col3 = zt.col3   -- same value
        AND nx.col_date_updated > zt.col_date_updated -- more recent
        );
-1
votes

This is fastest way:

SELECT * FROM tablename WHERE col_pk IN 
       (SELECT col_pk FROM 
               (SELECT col_pk, ROW_NUMBER() OVER (partition BY col2, col3 ORDER BY col_date_updated) AS rnum
        FROM tablename) t
 WHERE t.rnum > 1);

if you want delete:

DELETE FROM tablename WHERE col_pk IN 
       (SELECT col_pk FROM 
               (SELECT col_pk, ROW_NUMBER() OVER (partition BY col2, col3 ORDER BY col_date_updated) AS rnum
        FROM tablename DESC) t
 WHERE t.rnum > 1);