Why does Spark Planner in Spark 2.3 prefer a sort merge join over a shuffled hash join? In other words, why is spark.sql.join.preferSortMergeJoin configuration property internal and turned on by default? What's wrong with a shuffled hash join? Is this specific to Spark that it does computations in distributed fashion or something else more inherent in the join algorithm?
You can find the property used in the JoinSelection execution planning strategy here and here that looks like:
case ... if !conf.preferSortMergeJoin && ... =>
Seq(joins.ShuffledHashJoinExec(...))