I have to run some Spark python scripts as Oozie workflows, I've tested the scripts locally with Spark but when I submit them to Oozie I can't figure out why is not working. I'm using the Cloudera VM, and I'm managing Oozie with the Hue dashboard. Here is the workflow configuration for the spark action:
Spark Master: local[*]
Mode: client
App name: myApp
Jars/py files: hdfs://localhost:8120/user/cloudera/example.py
Main class: org.apache.spark
I tried also to run a simple example that just prints something, but every script I submit Oozie gives me this output:
>>> Invoking Spark class now >>>
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://quickstart.cloudera:8020/user/cloudera/oozie-oozi/0000005-161228161942928-oozie-oozi-W/spark-cc87--spark/action-data.seq
Oozie Launcher ends
[EDIT]
I found out that the workflow starts only if I set spark master: yarn-cluster, but even in this mode it is launched the yarn container that remains stuck at 95% completed map while spark app remains in status ACCEPTED. I'm trying to change Yarn memory parameters for allowing the Spark action to start. The stout just print Heartbeat
[SOLVED]
The oozie workflow starts only if the py file is local, and manually inserted into the lib folder after hue has created the workflow folder. I think that the best solution is still to write a shell script with a spark-submit
hdfs://localhost:8120/user/cloudera/example.py
– mrsrinivas