I needed some clarifications regarding the oozie launcher job.
1) Is the launcher job launched per workflow application (with several actions) or per action within a workflow application?
2) Use Case: I have workflows that contain multiple shell actions (which internally execute spark, hive, pig actions etc.). The reason for using shell is because additional parameters like partition date can be computed using custom logic and passed to hive using .q files
Example Shell File:
hive -hiveconf DATABASE_NAME=$1 -hiveconf MASTER_TABLE_NAME=$2 -hiveconf SOURCE_TABLE_NAME=$3 -hiveconf -f $4
Example .q File:
use ${hiveconf:DATABASE_NAME};
insert overwrite into table ${hiveconf:MASTER_TABLE_NAME} select * from ${hiveconf:SOURCE_TABLE_NAME};
I set the oozie.launcher.mapreduce.job.queuename and mapreduce.job.queuename to different queues to avoid starvation of task slots in a single queue. I also omitted the <capture-output></capture-output> in the corresponding shell action. However, I still see the launcher job occupying a lot of memory from the launcher queue.
- Is this because the launcher job caches the log ouput that comes from hive?
- Is it necessary to give the launcher job enough memory when executing a shell action the way I am?
- What would happen if I explicitly limited the launcher job memory?
I would highly appreciate it if someone could outline the responsibilities of the oozie launcher job.
Thanks!