I am new to Spark SQL. My role involves writing Spark sql queries for data transformation. Recently I got introduced to Broadcast Hash Join
(BHJ
) in Spark SQL. I understand that a BHJ
performs very well when the broadcasted table is very small and can be induced by using query hints.
For e.g.
select /*+ BROADCAST(B) */
*
from A
Left Join B
on A.id = B.id;
I have also read that there are 2 types of Broadcast Joins - Driver BHJ
& Executor BHJ
(the latter yields better performance).
Hence, when I use a Broadcast hint in my query, does Spark use a Driver BHJ
or an Executor BHJ
?
How can I command Spark (via hints etc) to induce an Executor BHJ
instead of a Driver BHJ
?
I use Spark SQL 2.4.
Thanks