1
votes

I needed some clarifications regarding the oozie launcher job.

1) Is the launcher job launched per workflow application (with several actions) or per action within a workflow application?

2) Use Case: I have workflows that contain multiple shell actions (which internally execute spark, hive, pig actions etc.). The reason for using shell is because additional parameters like partition date can be computed using custom logic and passed to hive using .q files

Example Shell File:

hive -hiveconf DATABASE_NAME=$1 -hiveconf MASTER_TABLE_NAME=$2 -hiveconf SOURCE_TABLE_NAME=$3 -hiveconf -f $4

Example .q File:

use ${hiveconf:DATABASE_NAME};
insert overwrite into table ${hiveconf:MASTER_TABLE_NAME} select * from ${hiveconf:SOURCE_TABLE_NAME};

I set the oozie.launcher.mapreduce.job.queuename and mapreduce.job.queuename to different queues to avoid starvation of task slots in a single queue. I also omitted the <capture-output></capture-output> in the corresponding shell action. However, I still see the launcher job occupying a lot of memory from the launcher queue.

  • Is this because the launcher job caches the log ouput that comes from hive?
  • Is it necessary to give the launcher job enough memory when executing a shell action the way I am?
  • What would happen if I explicitly limited the launcher job memory?

I would highly appreciate it if someone could outline the responsibilities of the oozie launcher job.

Thanks!

1

1 Answers

0
votes

Is the launcher job launched per workflow application (with several actions) or per action within a workflow application?

The launcher job is launched per action in the workflow.

I would highly recommend you to use respective oozie actions, Hive, Pig etc. Because it allows oozie to handle your workflow and actions in a better manner.