1
votes

I'm trying to setup oozie on a CDH 5.7 cluster. I've installed and configured everything by following steps from cloudera documentation. Finally I extracted oozie-examples.tar.gz, -put it to hdfs and tried to run some examples. MR example runs fine, but the spark one fails with the following error:

Resource hdfs://cluster/user/hdfs/.sparkStaging/application_1462195303197_0009/oozie-examples.jar changed on src filesystem (expected 1462196523983, was 1462196524951

The command I used to run the example was:

oozie job -config /usr/share/doc/oozie/examples/apps/spark/job.properties -run

The contents of job.properties:

nameNode=hdfs://cluster:8020
jobTracker=aleo-master-0:8021
master=yarn-cluster
queueName=default
examplesRoot=examples
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/spark

And workflow.xml:

<workflow-app xmlns='uri:oozie:workflow:0.5' name='SparkFileCopy'>
<start to='spark-node' />

<action name='spark-node'>
    <spark xmlns="uri:oozie:spark-action:0.1">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <prepare>
            <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/spark"/>
        </prepare>
        <master>${master}</master>
        <name>Spark-FileCopy</name>
        <class>org.apache.oozie.example.SparkFileCopy</class>
        <jar>${nameNode}/user/${wf:user()}/${examplesRoot}/apps/spark/lib/oozie-examples.jar</jar>
        <arg>${nameNode}/user/${wf:user()}/${examplesRoot}/input-data/text/data.txt</arg>
        <arg>${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/spark</arg>
    </spark>
    <ok to="end" />
    <error to="fail" />
</action>

<kill name="fail">
    <message>Workflow failed, error
        message[${wf:errorMessage(wf:lastErrorNode())}]
    </message>
</kill>
<end name='end' />

Version information:

  1. Spark 1.6.0
  2. Oozie 4.1.0-cdh5.7.0

Has anyone seen this problem before? I also tried running SparkPi with my own workflow definition, but the result was the same.

Thanks for help!

1
Looks like you are having version mismatch, Under oozie lib check which version of spark jar's are available. - vgunnu
Both /usr/lib/oozie/lib and sharelib on hdfs contain spark jars from cloudera with the correct (1.6.0) version, eg. spark-core_2.10-1.6.0-cdh5.7.0.jar. The only non standard component I have is Hive 2.0. - Michał Wyrwalski
i have not used CDH but for general purpose oozie-4.1.0 doesnt support spark action. they started providing support for spark action from oozie-4.2.0. - arglee

1 Answers

0
votes

Did you try to clean up sparks staging path? Spark is copying a temp copy of the given jar into its staging hdfs path and may not be able to distinguish two different jars with the same name in there.