0
votes

I have python script which I 'm able to run through spark-submit. I need to use it in Oozie.

<!-- move files from local disk to hdfs -->
<action name="forceLoadFromLocal2hdfs">
<shell xmlns="uri:oozie:shell-action:0.3">
  <job-tracker>${jobTracker}</job-tracker>
  <name-node>${nameNode}</name-node>
  <configuration>
    <property>
      <name>mapred.job.queue.name</name>
      <value>${queueName}</value>
    </property>
  </configuration>
  <exec>driver-script.sh</exec>
<!-- single -->
  <argument>s</argument>
<!-- py script -->
  <argument>load_local_2_hdfs.py</argument>
<!-- local file to be moved-->
  <argument>localPathFile</argument>
<!-- hdfs destination folder, be aware of, script is deleting existing folder! -->
  <argument>hdfFolder</argument>
  <file>${workflowRoot}driver-script.sh#driver-script.sh</file>
  <file>${workflowRoot}load_local_2_hdfs.py#load_local_2_hdfs.py</file>
</shell>
<ok to="end"/>
<error to="killAction"/> 
</action>

The script by itself through driver-script.sh runs fine. Through oozie, even the status of workflow is SUCCEEDED, the file is not copied to hdfs. I was not able to find any error logs, or related logs to pyspark job.

I have another topic about supressed logs from Spark by oozie here

1

1 Answers

0
votes

Set your script to set -x in the beginning that will show you which line the script is it. You can see those in the stderr.

Can you elaborate on what you mean by file is not copied ? To help you better.