0
votes

Is there a way to combine an GROUP BY and OBJECT_AGG in snowflake with different levels of aggregation between the two?

In the following example, we wish to return, for each city, an object mapping cuisine to the highest michelin stars of that cuisine in the city:

City Restaurant Cuisine Michelin Stars
San Fransisco Quince Californian 3
San Fransisco Coi Californian 2
San Fransisco Mister Jius Chinese 1
London Le Gavroche French 2
London La Dame de Pic French 2
London Restaurant Gordon Ramsay French 3

We want to produce the following:

City Cuisine to Top Rating
San Fransisco {'Californian': 3, 'Chinese': 1}
London {'French': 3}

My initial approach is:

SELECT
city,
OBJECT_AGG(cuisine, MAX(michelin_stars)::variant) over (partition by cuisine)
FROM
top_restaurants
GROUP BY city

This returns the error cuisine is not a valid group by expression.

This question is similar to

From the above, I believe the cause is the GROUP BY is calculated before the PARTITION BY and breaks when the GROUP BY drops cuisine as it attempts to aggregate.

References:

https://docs.snowflake.com/en/sql-reference/functions/object_agg.html

2
if I'm understanding correctly, you could use a CTE to do the aggregate first, and then create an object from that CTE result. However, based on your query, wouldn't you want to include the cuisine in your group by? Would that work?Mike Walton
Yes, I believe this could be done by first a CTE with GROUP BY on both city and cuisine, grabbing the highest michelin star by both, and then rolling up again by city. But I'd love to be able to do this cleanly in a single query! :)rmstmppr
While it is usually best to write sql in single queries without nesting, in some cases such as this one I personally wouldn't even try to do it because it would make the code difficult to understand and maintain. When there is too much complexity in a query, I recommend using a CTE to compartmentalize the logic.Maja F.

2 Answers

1
votes

I believe the cause is the GROUP BY is calculated before the PARTITION BY and breaks when the GROUP BY drops cuisine as it attempts to aggregate.

This is exactly what happened.

Related: A Beginner’s Guide to the True Order of SQL Operations


Alternative approach:

WITH cte AS (
   SELECT *
   FROM top_restaurants
   QUALIFY ROW_NUMBER() OVER(PARTITION BY city,cuisine 
                             ORDER BY michelin_stars DESC) = 1
)
SELECT city,
       OBJECT_AGG(cuisine, michelin_stars::variant)
FROM cte
GROUP BY city
0
votes

Thanks to the order of operation, you can still do it in one select. You just have to aggregate by city and cuisine first. When it's time for window function to shine, you partition by city. Obviously this leads to duplicates because window function simply applies calculations to the result set left by group by without collapsing any rows. Add a distinct and you're good to go

select distinct 
       city,
       object_agg(cuisine, max(michelin_stars)::variant) over (partition by city)
from t
group by city, cuisine;