0
votes

I'm doing what I would have expected to be a fairly straightforward query on a modified version of the imdb database:

select primary_name, release_year, max(rating)
from titles natural join primary_names natural join title_ratings
group by year
having title_category = 'film' and year > 1989;

However, I'm immediately running into

"column must appear in the GROUP BY clause or be used in an aggregate function."

I've tried researching this but have gotten confusing information; some examples I've found for this problem look structurally identical to mine, where others state that you must group every single selected parameter, which defeats the whole purpose of a group as I'm only wanting to select the maximum entry per year.

What am I doing wrong with this query?

Expected result: table with 3 columns which displays the highest-rated movie of each year.

2
Edit your question and provide sample data and desired results. As the error suggests, the query is wrong. However, you provide no clues on what the data actually looks like. - Gordon Linoff
Have you considered taking your error message literally? - Dan Bracuk
I've outlined this in the problem description; I don't WANT primary_names.primary_name to be in the group by clause, and I'm not sure what aggregate it's supposed to be used in or why it's necessary to do so. - MMMMMCK
MySQL 5.7.5 and up implements detection of functional dependence. If the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them. (Before 5.7.5, MySQL does not detect functional dependency and ONLY_FULL_GROUP_BY is not enabled by default - dev.mysql.com/doc/refman/5.7/en/group-by-handling.html - P.Salmon
If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. This causes MySQL to accept the preceding query. In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want. - dev.mysql.com/doc/refman/5.7/en/group-by-handling.html - P.Salmon

2 Answers

2
votes

If you want the maximum entry per year, then you should do something like this:

select r.*
from ratings r
where r.rating = (select max(r2.rating) where r2.year = r.year) and
      r.year > 1989;

In other words, group by is the wrong approach to writing this query.

I would also strongly encourage you to forget that natural join exists at all. It is an abomination. It uses the names of common columns for joins. It does not even use properly declared foreign key relationships. In addition, you cannot see what columns are used for the join.

While I am it, another piece of advice: qualify all column names in queries that have more than one table reference. That is, include the table alias in the column name.

0
votes

If you want to display all the columns you can user window function like :

select primary_name, year, max(rating) Over (Partition by year) as rating
from titles natural 
join primary_names natural join ratings
where title_type = 'film' and year > 1989;