I use Databricks Community Edition.
My Spark program creates multiple jobs. Why? I thought there should be one job and it could have multiple stages.
My Understanding is, when spark program is submitted, it will create one JOB, multiple stages ( usually new stage per shuffle operation ). Below is code being used where I have 2 possible shuffle operations ( reduceByKey / SortByKey ) and one action (Take(5)).
rdd1 = sc.textFile('/databricks-datasets/flights')
rdd2 = rdd1.flatMap(lambda x: x.split(",")).map(lambda x: (x,1)).reduceByKey(lambda x,y:x+y,8).sortByKey(ascending=False).take(5)
One more observation, jobs seem to have new stage ( some of them are skipped ), what is causing the new job creation.