Spark scala window count max

Question

I have following df:-

result	state	clubName
win	XYZ	club1
win	XYZ	club2
win	XYZ	club1
win	PQR	club3

I need state wise max wining clubName

val byState =Window.partitionBy("state").orderBy('state)

I tried creating a window but does not helps..

Expected Result :-

Some like this in sql

select temp.res
(select count(result) as res
from table
group by clubName) temp
group by state

e.g

state	max_count_of_wins	clubName
XYZ	2	club1

What do you mean by state wise max wining clubName, can you add the expected result aswell ? — koiralo
club1 and club2 have the same number of wins. why do you choose club1? and why there is no row for state PQR in your result? — blackbishop
club1 has one win. Why does your sample result has two? In general, most of the time, if you can express what you need in sql, you can just use spark.sql to get it. Your sql isn't actually valid though. Perhaps, it makes sense to start with fixing that. — Dima

mck mck · Accepted Answer · 2021-03-10T11:53:58

You can get the win count for each club, then assign a rank for each club ordered by wins, and filter those rows with rank = 1.

import org.apache.spark.sql.expressions.Window

val df2 = df.withColumn(
    "wins", 
    count(when(col("result") === "win", 1))
    .over(Window.partitionBy("state","clubName"))
).withColumn(
    "rn", 
    row_number().over(Window.partitionBy("state").orderBy(desc("wins")))
).filter("rn = 1").selectExpr("state", "wins as max_count_of_wins", "clubName")

df2.show
+-----+-----------------+--------+
|state|max_count_of_wins|clubName|
+-----+-----------------+--------+
|  PQR|                1|   club3|
|  XYZ|                2|   club1|
+-----+-----------------+--------+

Spark scala window count max

2 Answers