1
votes

How can I run an Oozie Hive or Hive2 Action with init scripts?

In the CLI this can be usually done via the -i init.hive argument; however when using this in an Oozie Action via <argument>-i init.hive</argument> the workflow stops with an error.

I linked the init.hive file with the <file>init.hive#init.hive</file> property and it is available in the local appcache directory.

$ ll appcache/application_1480609892100_0274/container_e55_1480609892100_0274_01_000001/ | grep init
> lrwxrwxrwx 1 root root    42 Jan 12 12:24 init.hive -> /hadoop/yarn/local/filecache/519/init.hive

The error (in the local appcache) is the following

Connecting to jdbc:hive2://localhost:10000/
Connected to: Apache Hive (version 1.2.1000.2.4.0.0-169)
Driver: Hive JDBC (version 1.2.1000.2.4.0.0-169)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Running init script  init.hive
  init.hive (No such file or directory)

The hive2 action looks like this (the complete workflow can be found on Github https://github.com/chaosmail/oozie-bugs/tree/master/simple-hive-init/simple-hive-init-wf)

<action name="test-action">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
  <jdbc-url>${jdbcURL}</jdbc-url>
  <script>query.hive</script>
  <argument>-i init.hive</argument>
  <file>init.hive#init.hive</file>
</hive2>
<ok to="end"/>
<error to="fail"/>
</action>

Edit 1: added workflow action

1
Can you provide the complete workflow/action used here? Where have you stored the init.hive file? - YoungHobbit
The init.hive file is stored in the same dir as the workflow.xml - Christoph Körner
Interestingly, running a shell action and outputting pwd it matches with the appcache/application_1480609892100_0274/container_e55_1480609892100_0274_01_000001 directory. It seems that the beeline client might not be started in the same local container directory - otherwise it should find the init.hive file - Christoph Körner
The Oozie documentation oozie.apache.org/docs/4.2.0/DG_Hive2ActionExtension.html states that you may have multiple <argument> tags, and your logs show a leading space before the script name (the space following -i??) >> consider trying <argument>-i</argument><argument>init.hive</argument> - Samson Scharfrichter
I am refering to the Oozie source code where paths passed in the <script> tag get cached separately. Thats why also a relativ path would work in the <script> tag. github.com/apache/oozie/blob/master/core/src/main/java/org/… - Christoph Körner

1 Answers

2
votes

[Recap of the comments thread above, plus some extra stuff in retrospect]

The Oozie documentation states that you may have multiple <argument> elements in your Action, which hints that the arguments must be provided separately.
In retrospect, it makes sense -- on a command line, it's the shell that would parse the list of arguments into an args[] array for the Java executable, but Oozie is not a shell interpreter...

And experience shows that Beeline accepts two syntax variants for its command-line args...

  • -xValue (one arg) means option -x with associated Value
  • -x followed by Value (two args) means the same thing


  • <argument>-xValue</argument>
  • <argument>-x</argument> <argument>Value</argument>

On the other hand, <argument>-x Value</argument> would fail, because in single-arg syntax, Beeline considers that the separator space should be part of the value...!