5
votes

I'm a newbie in Oozie and I've read some Oozie shell action examples but this got me confused about certain things.

There are examples I've seen where there is no <file> tag.

Some example, like in Cloudera here, repeats the shell script in file tag:

<shell xmlns="uri:oozie:shell-action:0.2">
    <exec>check-hour.sh</exec>
    <argument>${earthquakeMinThreshold}</argument>
    <file>check-hour.sh</file>
</shell>

While in Oozie's website, writes the shell script (the reference ${EXEC} from job.properties, which points to script.sh file) twice, separated by #.

<shell xmlns="uri:oozie:shell-action:0.1">
    ...
    <exec>${EXEC}</exec>
    <argument>A</argument>
    <argument>B</argument>
    <file>${EXEC}#${EXEC}</file>
</shell>

There are also examples I've seen where the path (HDFS or local?) is prepended before the script.sh#script.sh within the <file> tag.

<shell xmlns="uri:oozie:shell-action:0.1">
    ...
    <exec>script.sh</exec>
    <argument>A</argument>
    <argument>B</argument>
    <file>/path/script.sh#script.sh</file>
</shell>

As I understand, any shell script file can be included in the workflow HDFS path (same path where workflow.xml resides).

Can someone explain the differences in these examples and how <exec>, <file>, script.sh#script.sh, and the /path/script.sh#script.sh are used?

1

1 Answers

19
votes

<file>hdfs:///apps/duh/mystuff/check-hour.sh</file> means "download that HDFS file into the Current Working Dir of the YARN container that runs the Oozie Launcher for the Shell action, using the same file name by default, so that I can reference it as ./check-hour.sh or simply check-hour.sh in the <exec> element".

<file>check-hour.sh</file> means "download that HDFS file -- from my user's home dir e.g. hdfs:///user/borat/check-hour.sh -- into etc. etc.".

<file>hdfs:///apps/duh/mystuff/check-hour.sh#youpi</file> means "download that HDFS file etc. etc., renaming it as youpi, so that I can reference it as ./youpi or simply youpi in the element".

Note that the Hue UI often inserts unnecessary # stuff with no actual name change. That's why you will see it so often.