168
votes

Just curious about SQL syntax. So if I have

SELECT 
 itemName as ItemName,
 substring(itemName, 1,1) as FirstLetter,
 Count(itemName)
FROM table1
GROUP BY itemName, FirstLetter

This would be incorrect because

GROUP BY itemName, FirstLetter 

really should be

GROUP BY itemName, substring(itemName, 1,1)

But why can't we simply use the former for convenience?

10
that's allowed in PostgresqlMichael Buen
MySQL allows it alsoKip
which rdbms you are talking about?Shiwangini
This question is not exclusive to group by, but also to join, for examplephil294

10 Answers

343
votes

SQL is implemented as if a query was executed in the following order:

  1. FROM clause
  2. WHERE clause
  3. GROUP BY clause
  4. HAVING clause
  5. SELECT clause
  6. ORDER BY clause

For most relational database systems, this order explains which names (columns or aliases) are valid because they must have been introduced in a previous step.

So in Oracle and SQL Server, you cannot use a term in the GROUP BY clause that you define in the SELECT clause because the GROUP BY is executed before the SELECT clause.

There are exceptions though: MySQL and Postgres seem to have additional smartness that allows it.

33
votes

You could always use a subquery so you can use the alias; Of course, check the performance (Possible the db server will run both the same, but never hurts to verify):

SELECT ItemName, FirstLetter, COUNT(ItemName)
FROM (
    SELECT ItemName, SUBSTRING(ItemName, 1, 1) AS FirstLetter
    FROM table1
    ) ItemNames
GROUP BY ItemName, FirstLetter
21
votes

At least in PostgreSQL you can use the column number in the resultset in your GROUP BY clause:

SELECT 
 itemName as ItemName,
 substring(itemName, 1,1) as FirstLetter,
 Count(itemName)
FROM table1
GROUP BY 1, 2

Of course this starts to be a pain if you are doing this interactively and you edit the query to change the number or order of columns in the result. But still.

15
votes

SQL Server doesn't allow you to reference the alias in the GROUP BY clause because of the logical order of processing. The GROUP BY clause is processed before the SELECT clause, so the alias is not known when the GROUP BY clause is evaluated. This also explains why you can use the alias in the ORDER BY clause.

Here is one source for information on the SQL Server logical processing phases.

13
votes

I'm not answering why it is so, but only wanted to show a way around that limitation in SQL Server by using CROSS APPLY to create the alias. You then use it in the GROUP BY clause, like so:

SELECT 
 itemName as ItemName,
 FirstLetter,
 Count(itemName)
FROM table1
CROSS APPLY (SELECT substring(itemName, 1,1) as FirstLetter) Alias
GROUP BY itemName, FirstLetter
6
votes

Caution that using alias in the Group By (for services that support it, such as postgres) can have unintended results. For example, if you create an alias that already exists in the inner statement, the Group By will chose the inner field name.

-- Working example in postgres
select col1 as col1_1, avg(col3) as col2_1
from
    (select gender as col1, maritalstatus as col2, 
    yearlyincome as col3 from customer) as layer_1
group by col1_1;

-- Failing example in postgres
select col2 as col1, avg(col3)
from
    (select gender as col1, maritalstatus as col2,
    yearlyincome as col3 from customer) as layer_1
group by col1;
4
votes

Some DBMSs will let you use an alias instead of having to repeat the entire expression.
Teradata is one such example.

I avoid ordinal position notation as recommended by Bill for reasons documented in this SO question.

The easy and robust alternative is to always repeat the expression in the GROUP BY clause.
DRY does NOT apply to SQL.

1
votes

Beware of using aliases when grouping the results from a view in SQLite. You will get unexpected results if the alias name is the same as the column name of any underlying tables (to the views.)

0
votes

Back in the day I found that Rdb, the former DEC product now supported by Oracle allowed the column alias to be used in the GROUP BY. Mainstream Oracle through version 11 does not allow the column alias to be used in the GROUP BY. Not sure what Postgresql, SQL Server, MySQL, etc will or won't allow. YMMV.

0
votes

In at least Postgres, you can use the alias name in the group by clause:

SELECT itemName as ItemName1, substring(itemName, 1,1) as FirstLetter, Count(itemName) FROM table1 GROUP BY ItemName1, FirstLetter;

I wouldn't recommend renaming an alias as a change in capitalization, that causes confusion.