Another possible way of doing this is
;
--Ensure that any immediately preceding statement is terminated with a semicolon above
WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY Col1, Col2, Col3
ORDER BY ( SELECT 0)) RN
FROM #MyTable)
DELETE FROM cte
WHERE RN > 1;
I am using ORDER BY (SELECT 0)
above as it is arbitrary which row to preserve in the event of a tie.
To preserve the latest one in RowID
order for example you could use ORDER BY RowID DESC
Execution Plans
The execution plan for this is often simpler and more efficient than that in the accepted answer as it does not require the self join.
This is not always the case however. One place where the GROUP BY
solution might be preferred is situations where a hash aggregate would be chosen in preference to a stream aggregate.
The ROW_NUMBER
solution will always give pretty much the same plan whereas the GROUP BY
strategy is more flexible.
Factors which might favour the hash aggregate approach would be
- No useful index on the partitioning columns
- relatively fewer groups with relatively more duplicates in each group
In extreme versions of this second case (if there are very few groups with many duplicates in each) one could also consider simply inserting the rows to keep into a new table then TRUNCATE
-ing the original and copying them back to minimise logging compared to deleting a very high proportion of the rows.
DELETE FROM
a CTE term directly. See stackoverflow.com/q/18439054/398670 – Craig RingerROWID()
function by the RowID column, if any) – maf-soft