I have two tables both are clustered on the same columns but while joining both the tables on clustered columns the execution plan shows both exchange and sort steps.
Both tables are bucketed on the same columns (key_column). Both the tables are or compressed and table A is partitioned and bucketed and table B is bucketed on the same column.
I want to avoid sort and exchange step from my plan and as per the documentation the bucketed table should have avoided both sort and exchange step.
I even tried with below hive properties:
spark.sql('set spark.sql.orc.filterPushdown=true')
spark.sql('set hive.optimize.bucketmapjoin = true')
spark.sql('set hive.optimize.bucketmapjoin.sortedmerge = true')
spark.sql('set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat')
spark.sql('set hive.optimize.bucketmapjoin = true')
spark.sql('set hive.stats.autogather=true')
spark.sql('set hive.compute.query.using.stats=true')
spark.sql('set hive.optimize.index.filter=true')
also collected stats for the tables:
Both Sort and Exchange can be seen in physical plan but hive bucketed tables should have avoided the sort and exchange step
[count#1311L])
+- *Project
+- *SortMergeJoin [key_column#1079], [key_column#1218],Inner
sort step: :- *Sort [key_column#1079 ASC NULLS FIRST], false, 0
exchange step: : +- Exchange hashpartitioning(key_column#1079, 200)
: +- *Filter isnotnull(key_column#1079)
Expected Result: NO SORT and EXCHANGE
[count#1311L])
+- *Project
+- *SortMergeJoin [key_column#1079], [key_column#1218], Inner
: +- *Filter isnotnull(key_column#1079)`enter code here`
I want to avoid sort and exchange step from my plan and as per the documentation the bucketed table should have avoided both sort and exchange step.