0
votes

I am new to Spark. I have couple of questions regarding the Spark Web UI:-

  • I have seen that Spark can create multiple Jobs for the same application. On what basis does it creates the Jobs ?

  • I understand Spark creates multiple Stages for a single Job around
    Shuffle boundaries. Also I understand that there is 1 task per
    partition. However, I have seen that a particular Stage (E.g. Stage1) of a particular Job creating lesser number of tasks than the default shuffle partitions value (for e.g. only 2/2 completed). And I have also seen, the next Stage (Stage 2) of the same Job creating
    1500 tasks (for E.g. 1500/1500 completed) which is more than
    the default shuffle partitions value.

    So, how does Spark determine how many tasks should it create for any particular Stage to execute ?

Can anyone please help me understand the above.

1

1 Answers

0
votes

the max number of task in one moment dependent on you cores and exec numbers, different stage have different task number