I'm trying to convert a simple aggregation code from PySpark to Scala.
The dataframes:
# PySpark
from pyspark.sql import functions as F
df = spark.createDataFrame(
[([10, 100],),
([20, 200],)],
['vals'])
// Scala
val df = Seq(
(Seq(10, 100)),
(Seq(20, 200)),
).toDF("vals")
Aggregation expansion - OK in PySpark:
df2 = df.agg(
*[F.sum(F.col("vals")[i]).alias(f"col{i}") for i in range(2)]
)
df2.show()
# +----+----+
# |col0|col1|
# +----+----+
# | 30| 300|
# +----+----+
But in Scala...
val df2 = df.agg(
(0 until 2).map(i => sum($"vals"(i)).alias(s"col$i")): _*
)
(0 until 2).map(i => sum($"vals"(i)).alias(s"col$i")): _* ^ On line 2: error: no `: _*` annotation allowed here (such annotations are only allowed in arguments to *-parameters)
The syntax seems almost the same to this select
which works well:
val df2 = df.select(
(0 until 2).map(i => $"vals"(i).alias(s"col$i")): _*
)
Does expression expansion work in Scala Spark aggregations? How?