I have a question around spark Broadcast join. By default the Broadcast hash join size is 10MB.
case1: we have enough Memory in cluster to hold the Broadcast DF.
If the DF size is greater than the default broadcast join size, say 15 MB is the DF size, and If I broadcast this DF across all the nodes in the cluster, will it still perform a broadcast join? since 15MB is greater than default broadcast join size, will it go for any other join even though we have broadcast-ed the DF?
case2: Not enough Memory in cluster to hold the Broadcasted DF.
So let us suppose if I have 15MB Data Frame and If I want to Broadcast this Data Frame during the join, and the memory isn't available on say one or few nodes to hold this data.(15MB is a hypothetical number) Will it fail with out of Memory error or will it spill the data to disk?