I want to use spark SQL window functions to do some aggregations and windowing.
Suppose I'm using the example table provided here a: https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
I want to run the query to give me the max 2 revenue for each category and also the count of product for each category.
After I run this query
SELECT
product,
category,
revenue
FROM (
SELECT
product,
category,
revenue,
dense_rank() OVER (PARTITION BY category ORDER BY revenue DESC) as rank
count(*) OVER (PARTITION BY category ORDER BY revenue DESC) as count
FROM productRevenue) tmp
WHERE
rank <= 2
I got the table like this:
product category revenue count
pro2 tablet 6500 1
mini tablet 5500 2
instead of
product category revenue count
pro2 tablet 6500 5
mini tablet 5500 5
which is what I expected.
How should I write my code to get the right count for each category (instead of using another separate Group By statement)?
