Running spark job in parallel

Question

Let say i have 2 independent job, such as writing in parallel to multiple stores, that, i would take a collection, perform an operation and then write the content to a file system and 3 other store.

How can I run those 3 operation in parallel ?

I'm working with Scala. The normal way for me would be to launch 4 futures. Hence i wonder if i can do the same.

What is not making me at ease here, is that i have no idea how an executionContext/ThreadPoolExecutor actually interact with Spark job scheduling.

That is if i do

Future {job1} Future {job2} Future {job3} Future {job4}

What does happen ? Can someone explain the Spark mechanic here ? How the future will be send to executor for execution ? Let say i have schedule fair sharing. What happen next ? How spark handle something that is in a future ?

Quockhanh Pham Quockhanh Pham · Accepted Answer · 2017-12-13T11:07:30

By default, applications submitted to the standalone mode cluster will run in FIFO (first-in-first-out) order, and each application will try to use all available nodes.

Mesos mode is dynamic sharing of CPU cores. In this mode, each Spark application still has a fixed and independent memory allocation, but when the application is not running tasks on a machine, other applications may run tasks on those cores.

See this doc

https://spark.apache.org/docs/latest/job-scheduling.html

Running spark job in parallel

1 Answers