Job submitted in oozie is getting killed

Question

I am configuring workflow in oozie to execute a mapreduce task using java action. The workflow.xml used is as below:

<workflow-app name="accesslogloader" xmlns="uri:oozie:workflow:0.1">
  <start to="javamain"/>
  <action name="javamain">
    <java>
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${namenode}</name-node>
      <configuration>
        <property>
          <name>mapred.job.queue.name</name>
          <value>${queueName}</value>
        </property>
  <property>
    <name>fs.hdfs.impl.disable.cache</name>
    <value>true</value>
  </property>
      </configuration>
      <main-class>org.path.AccessLogHandler</main-class>
    </java>
    <ok to="end"/>
    <error to="killjob"/>
  </action>
  <kill name="killjob">
    <message>"Job killed due to error"</message>
  </kill>
  <end name="end"/>  
</workflow-app>

After running the oozie job. the MR job runs and saves data to the hbase. I see the MR job completed as the data is inserted in the hbase.

But after the completion the oozie UI shows as KILLED state.

I am seeing the following error in the syslog:

2014-03-13 00:20:23,425 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2014-03-13 00:20:24,311 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: Filesystem closed
2014-03-13 00:20:24,315 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:565)
   at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:589)
   at java.io.FilterInputStream.close(FilterInputStream.java:181)
   at org.apache.hadoop.util.LineReader.close(LineReader.java:149)
   at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:241)
   at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:207)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:438)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)

What can be the problem?

Capacytron Capacytron · Accepted Answer · 2014-05-11T11:50:22

I do have the same problem. My java action does run a series of complex jobs. Defenetly, it's not good design, but it was the shortest way to reach the goal. I've tried to pass this prop

 <property>
    <name>fs.hdfs.impl.disable.cache</name>
    <value>true</value>
  </property>

It doesn't help. I have a hypothesis that java action runs longer than 10 min (default timeout period for a mpreduce task). So jobtracker kills it. My action runs more than 10 min. I didn't meet such problem when action run was less that 10 min. I've tried to pass property

<property>
                    <name>mapred.task.timeout</name>
                    <value>7200000</value>
                </property>

but it's not passed. Here is an action declaration

<action name="long-running-java-action">
        <java>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.queue.name</name>
                    <value>default</value>
                </property>
                <property>
                    <name>mapred.task.timeout</name>
                    <value>7200000</value>
                </property>
                <property> <!-- https://issues.apache.org/jira/browse/SQOOP-1226 ???? -->
                    <name>fs.hdfs.impl.disable.cache</name>
                    <value>true</value>
                </property>
            </configuration>
            <main-class>my.super.mapreduce.Runner</main-class>
            <java-opts>-Xmx4096m</java-opts>

            <arg>--config</arg>
            <arg>complexConfigGoesHere</arg>
        </java>
        <ok to="end"/>
        <error to="kill"/>
    </action>

I suppose that solution should be in increasing task timeout.

Job submitted in oozie is getting killed

1 Answers