1
votes

My task is to create an oozie workflow to Load Data to Hive tables every hour.

I am using CDH 5.7 in virtualbox

When i run the hive script which contains LOAD DATA INPATH '/sqoop_import_increment' INTO TABLE customer; it works perfectly, data gets loaded to the hive table.

But When i run the same script on oozie workflow the job get killed at 66% and the error message is Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001]

Note: but hive script for create table works perfectly with oozie workflow. plz help.

hive script:

use test;

create external table if not exists customer(customer_id int,name string,address string)row format delimited fields terminated by ',';

load data inpath /sqoop_import_increment into table customer;

workflow.xml:

<workflow-app name="hive_script" xmlns="uri:oozie:workflow:0.5">
    <start to="hive-4327"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="hive-4327" cred="hcat">
        <hive xmlns="uri:oozie:hive-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
              <job-xml>lib/hive-config.xml</job-xml>
            <script>lib/impala-script.hql</script>
        </hive>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

job.properties:

oozie.use.system.libpath=True
security_enabled=False
dryrun=False
jobTracker=localhost:8032
nameNode=hdfs://quickstart.cloudera:8020

hive-config.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

  <!-- Hive Configuration can either be stored in this file or in the hadoop configuration files  -->
  <!-- that are implied by Hadoop setup variables.                                                -->
  <!-- Aside from Hadoop setup variables - this file is provided as a convenience so that Hive    -->
  <!-- users do not have to edit hadoop configuration files (that may be managed as a centralized -->
  <!-- resource).                                                                                 -->

  <!-- Hive Execution Parameters -->

  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://127.0.0.1/metastore?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>cloudera</value>
  </property>

  <property>
    <name>hive.hwi.war.file</name>
    <value>/usr/lib/hive/lib/hive-hwi-0.8.1-cdh4.0.0.jar</value>
    <description>This is the WAR file with the jsp content for Hive Web Interface</description>
  </property>

  <property>
    <name>datanucleus.fixedDatastore</name>
    <value>true</value>
  </property>

  <property>
    <name>datanucleus.autoCreateSchema</name>
    <value>false</value>
  </property>

  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://127.0.0.1:9083</value>
    <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
  </property>
</configuration>
1
It seems that you missed simple quotes around the path in the hive script !54l3d
Did you check the launcher job and hive job logs, what is the error it say in there?YoungHobbit
I have tried with single quotes also it doesn't workSurendar Sa
I have checked the log in oozie it showing like Error:oozie-oozi-W@hive-4327] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [40000] But i don't know where i can check the hive log can you help meSurendar Sa
please provide hive-site.xml in your <job-xml> and tryCoder123

1 Answers

0
votes

The last time I ran into this problem, it turned out that the hive client was not installed on all data nodes.

When you run the hive query manually, you presumably do it from a node that has the hive client installed.But when oozie is asked to run the query, it will do so from a random data node. As such you will need to setup the hive client on all data nodes.

This assumes that you are not able to let oozie run hive queries in general (and don't have any specific issues with this particular command).