I'm trying to compute the largest value of the following DataFrame in Spark 1.6.1:
val df = sc.parallelize(Seq(1,2,3)).toDF("id")
A first approach would be to select the maximum value, and it works as expected:
df.select(max($"id")).show
The second approach could be to use withColumn as follows:
df.withColumn("max", max($"id")).show
But unfortunately it fails with the following error message:
org.apache.spark.sql.AnalysisException: expression 'id' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;
How can I compute the maximum value in a withColumn function without any Window or groupBy? If not possible, how can I do it in this specific case using a Window?
maxwithout some kind of aggregation first, either group by - wich is what your second attempt expects, or subquery - which is what your first working sample does), so I guess the error is only natural if you think in terms of SQL. - GPI