I am trying to do the broadcast hash join in spark 1.6.0 but could not succeed. below is example:
val DF1 = sqlContext.read.parquet("path1")
val DF2 = sqlContext.read.parquet("path2")
val Join = DF1.as("tc").join(broadcast(DF2.as("st")), Seq("col1"), "left_outer")
Even though I am using broadcast hint the explain on DF shows SortMergeOuterJoin. One reason for this I think is DF2 is greater than 20MB and by default property spark.sql.autoBroadcastJoinThreshold is 10 MB but I am not able to change the property of this variable in spark-shell. am I doing any thing wrong.
I tried as below
spark.sql.autoBroadcastJoinThreshold=100MB
scala> spark.sql.autoBroadcastJoinThreshold=100MB
<console>:1: error: Invalid literal number
spark.sql.autoBroadcastJoinThreshold=100MB
I need to set this property and try if I can do broadcast hash join and does that improve any performancs. I check many thread on stackoverflow but could not succeed. Can any one please help me here