2
votes

Is it ok to spawn multiple spark jobs within a main spark job, my main spark job's driver which was launched on yarn cluster, will do some preprocessing and based on it, it needs to launch multilple spark jobs on yarn cluster. Not sure if this right pattern.

Main spark job will launch other spark-job similar to calling multiple spark-submit within a Spark driver program. These spawned threads for new jobs will be totally different components, so these cannot be implemented using spark actions.enter image description here

Please share your thoughts.

Sample code i ve is as below for better understanding..

Object Mainsparkjob {

main(...){

val sc=new SparkContext(..)

Fetch from hive..using hivecontext
Fetch from hbase

//spawning multiple Futures..
Val future1=Future{
Val sparkjob= SparkLauncher(...).launch; spark.waitFor
}

Similarly, future2 to futureN.

future1.onComplete{...}
}
}//end of main spark job
1

1 Answers

0
votes

Use the workflow management tools like oozie to orchestrate this kind of dependency in the jobs.

Oozie has spark action, she'll action, give action , Java action, distcp , email everything is available there.

So we can set a nice decency between the jobs using oozie