I am starting to get more confused as I keep reading online resources about Spark architecture and scheduling. One resource says that: The number of tasks in a stage is the same as the number of partitions in the last RDD in the stage. On the other hand: Spark maps the number tasks on a particular Executor to the number of cores allocated to it. So, the first resource says that if I have 1000 partitions then I will have 1000 tasks no matter what my machine is. In the second case, If I have 4 core machine and 1000 partitions then what? I will have 4 tasks? Then how the data is processed?
Another confusion: each worker can process one task at a time and Executors can run multiple tasks over its lifetime, both in parallel and sequentially. So are tasks sequential or parallel?