3
votes

I am trying to execute a Map-Reduce task in an Oozie workflow using a <java> action.

O'Reilley's Apache Oozie (Islam and Srinivasan 2015) notes that:

While it’s not recommended, Java action can be used to run Hadoop MapReduce jobs because MapReduce jobs are nothing but Java programs after all. The main class invoked can be a Hadoop MapReduce driver and can call Hadoop APIs to run a MapReduce job. In that mode, Hadoop spawns more mappers and reducers as required and runs them on the cluster.

However, I'm not having success using this approach.

The action definition in the workflow looks like this:

<java>
    <!-- Namenode etc. in global configuration -->
    <prepare>
      <delete path="${transformOut}" />
    </prepare>
    <configuration>
        <property>
            <name>mapreduce.job.queuename</name>
            <value>default</value>
        </property>
    </configuration>
    <main-class>package.containing.TransformTool</main-class>
    <arg>${transformIn}</arg>
    <arg>${transformOut}</arg>
    <file>${avroJar}</file>
    <file>${avroMapReduceJar}</file>
</java>

The Tool implementation's main() implementation looks like this:

public static void main(String[] args) throws Exception {
    int res = ToolRunner.run(new TransformTool(), args);
    if (res != 0) {
        throw new Exception("Error running MapReduce.");
    }
}

The workflow will crash with the "Error running MapReduce" exception above every time; how do I get the output of the MapReduce to diagnose the problem? Is there a problem with using this Tool to run a MapReduce application? Am I using the wrong API calls?

I am extremely disinclined to use the Oozie <map-reduce> action, as each action in the workflow relies on several separately versioned AVRO schemas.

What's the issue here? I am using the 'new' mapreduce API for the task.

Thanks for any help.

1
Is your mapreduce job is getting launched or not? You can check that in the oozie UI. Because java action will start a mapper which will launch the actual mapreduce job for you. So check whether it is getting launched or not? – YoungHobbit
Also, are you setting these properties, ${transformIn}, ${transformOut}, ${avroJar} and ${avroMapReduceJar} in your job.properties? – Manjunath Ballur
Just a comment about setting mapreduce.job.queuename for a "launcher" action (i.e.Java, Shell, Sqoop... anything but MapReduce) >> it will be propagated to your child MapReduce job, if any, but not used for the "launcher" job itself; you should also set oozie.launcher.mapreduce.job.queuename for that one. And they can be different, e.g. a high-priority queue for launchers and default queue for heavy-duty child MR. – Samson Scharfrichter

1 Answers

6
votes

> how do I get the output of the MapReduce...

Back to the basics.

Since you don't care to mention which version of Hadoop and which version of Oozie you are using, I will assume a "recent" setup (e.g. Hadoop 2.7 w/ TimelineServer and Oozie 4.2). And since you don't mention which kind of interface you use (command-line? native Oozie/Yarn UI? Hue?) I will give a few examples using good'old'CLI.

> oozie jobs -localtime -len 10 -filter name=CrazyExperiment

Shows the last 10 executions of "CrazyExperiment" workflow, so that you can inject the appropriate "Job ID" in next commands.

> oozie job -info 0000005-151217173344062-oozie-oozi-W

Shows the status of that execution, from Oozie point of view. If your Java action is stuck in PREP mode, then Oozie failed to submit it to YARN; otherwise you will find something like job_1449681681381_5858 under "External ID". But beware! The job prefix is a legacy thing; the actual YARN ID is application_1449681681381_5858.

> oozie job -log 0000005-151217173344062-oozie-oozi-W

Shows the Oozie log, as could be expected.

> yarn logs -applicationId application_1449681681381_5858

Shows the consolidated logs for AppMaster (container #1) and Java action Launcher (container #2) -- after execution is over. The stdout log for Launcher contains a whole shitload of Oozie debug stuff, the real stdout is at the very bottom.

In case your Java action successfully spawned another YARN job, and you were careful to display the child "application ID", you should be able to retrieve it there and run another yarn logs command against it.

Enjoy your next 5 days of debugging ;-)