I am new to Spark. I have couple of questions regarding the Spark Web UI:-
I have seen that Spark can create multiple Jobs for the same application. On what basis does it creates the Jobs ?
I understand Spark creates multiple Stages for a single Job around
Shuffle boundaries. Also I understand that there is1
task per
partition. However, I have seen that a particular Stage (E.g. Stage1) of a particular Job creating lesser number of tasks than the default shuffle partitions value (for e.g. only2/2
completed). And I have also seen, the next Stage (Stage 2) of the same Job creating1500
tasks (for E.g.1500/1500
completed) which is more than
the default shuffle partitions value.So, how does Spark determine how many tasks should it create for any particular Stage to execute ?
Can anyone please help me understand the above.