0
votes

Is there a case where adding DISTINCT would change the results of a SELECT query that uses a GROUP BY clause?

Group by and distinct produce similar execution plans.

From my understanding, tables that use a GROUP BY clause can only have columns from the GROUP BY or aggregate functions.

List of aggregate functions appears to be deterministic and combinations from the GROUP BY would be unique so my assumption is that it would be redundant.

EDIT 1: Adding the DISTINCT keyword directly after SELECT. Not anywhere in the query like @lad2025's example: SELECT name, COUNT(DISTINCT col) ... GROUP BY name.

3

3 Answers

3
votes

You are under no obligation to SELECT all the GROUP BY columns so in this case it would change the results.

SELECT COUNT(*)
FROM sys.objects
GROUP BY schema_id, name

--- or

SELECT DISTINCT COUNT(*)
FROM sys.objects
GROUP BY schema_id, name
1
votes

The groups, as defined by the expressions and columns that appear in the group by clause, will be unique in the result set. As long as all those same columns are included in the select list then distinct will be redundant. As noted by Martin Smith this is not required though.

0
votes

Yes, it can change result when you use DISTINCT with aggregation function:

SELECT name, COUNT(col) AS result
FROM table
GROUP BY name

vs

SELECT name, COUNT(DISTINCT col) AS result
FROM table
GROUP BY name

In other cases like:

SELECT DISTINCT name
FROM table
GROUP BY name

DISTINCT is almost always redundant.

EDIT:

Corner case (when GROUP BY and SELECT column list does not match):

CREATE TABLE #tab(col1 INT, col2 INT);

INSERT INTO #tab
VALUES (1,1), (1,1), (2,1), (2,2)

SELECT DISTINCT col2
FROM #tab
GROUP BY col1, col2

SELECT col2
FROM #tab
GROUP BY col1, col2;

LiveDemo

Output:

╔══════╗                 ╔══════╗
║ col2 ║                 ║ col2 ║
╠══════╣      vs         ╠══════╣    
║    1 ║                 ║    1 ║           
║    2 ║                 ║    1 ║           
╚══════╝                 ║    2 ║
                         ╚══════╝