8
votes

I need to be able to sort columns in ascending and descending order and also allow nulls to be first or nulls to be last. Using RDDs I could use the sortByKey method with a custom comparator. I was wondering if there is a corresponding approach using the Dataset API. I see how to to add desc/asc to columns but I have no clue on the nulls ordering.

2

2 Answers

12
votes

You can also do it with the dataset API:

scala>     val df = Seq("a", "b", null).toDF("x")
df: org.apache.spark.sql.DataFrame = [x: string]

scala> df.select('*).orderBy('x.asc_nulls_last).show
+----+
|   x|
+----+
|   a|
|   b|
|null|
+----+


scala> df.select('*).orderBy('x.asc_nulls_first).show
+----+
|   x|
+----+
|null|
|   a|
|   b|
+----+

Same thing works with desc_nulls_last and desc_nulls_first.

4
votes

As mentioned by Oleksandr, there was a pull request for this. Now you can optionally use "nulls first" or "nulls last"

scala> spark.sql("select * from spark_10747 order by col3 nulls last").show
+----+----+----+
|col1|col2|col3|
+----+----+----+
|   6|   7|   4|
|   6|  11|   4|
|   6|  15|   8|
|   6|  15|   8|
|   6|   7|   8|
|   6|  12|  10|
|   6|   9|  10|
|   6|  13|null|
|   6|  10|null|
+----+----+----+