I have a dataframe where I am aggregating on a column and picking last element but It returns different result every time, Is there way I can resolve this issue, with out diff result everytime and get the right one.
val sourceDF = Seq(
(11,"a1", "a2"),
(11,"b1", "b2"),
(22,"c1", "c2"),
(22,"d1", "d2"),
(33,"e1", "e2")
).toDF("id","name", "city")
sourceDF.show(false)
sourceDF.groupBy("id").agg(
last("name"),
last("city")
).show(false)
+---+-----------------+-----------------+
|id |last(name, false)|last(city, false)|
+---+-----------------+-----------------+
|33 |e1 |e2 |
|11 |a1 |a2 |
|22 |c1 |c2 |
+---+-----------------+-----------------+
Thanks in advance.
lastfunction documentation ->The function is non-deterministic because its results depends on order of rows which may be non-deterministic after a shuffle.Use anorder byto get deterministic results. - Vamsi Prabhala