2
votes

I am trying to follow this instructions to enable history logs with Spark Oozie action. https://archive.cloudera.com/cdh5/cdh/5/oozie/DG_SparkActionExtension.html

To ensure that your Spark job shows up in the Spark History Server, make sure to specify these three Spark configuration properties either in spark-opts with --conf or from oozie.service.SparkConfigurationService.spark.configurations

  1. spark.yarn.historyServer.address=http://SPH-HOST:18088
  2. spark.eventLog.dir=hdfs://NN:8020/user/spark/applicationHistory
  3. spark.eventLog.enabled=true

Workflow defintion looks like this:

<action name="spark-9e7c">
    <spark xmlns="uri:oozie:spark-action:0.1">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <master>yarn-cluster</master>
        <mode>cluster</mode>
        <name>Correlation Engine</name>
          <class>Main Class</class>
        <jar>hdfs://<MACHINE IP>:8020/USER JAR</jar>
          <spark-opts> --conf spark.eventLog.dir=<MACHINE IP>:8020/user/spark/applicationHistory --conf spark.eventLog.enabled=true --conf spark.yarn.historyServer.address=<MACHINE IP>:18088/</spark-opts>
    </spark>
    <ok to="email-f5d5"/>
    <error to="email-a687"/>
</action>

When I test from a shell script history logs are logged correctly but with Oozie actions logs are not logged correctly. I have set all the three propeties.

1
Hi Please check my answer instead of spark-opts try to pass argument like mentioned in my answerRam Ghadiyaram
If you are okay with the answer, please flag-up "accepted-by-owner" ThxRam Ghadiyaram
Thanks so much for your prompt response RamPrasad. I moved the properties as recommended by you in the configuration section. Now I can some logs in the /user/spark/applictionHistory location as .inprogress. But still cannot see any log in the history server.Alchemist
it should work. pls check again. pls check any further mistakesRam Ghadiyaram
For me adding properties to both spark-opts and configuration worked. Thanks so much RamPrasad for your help.Alchemist

1 Answers

2
votes

With my experience, I think you have passed arguments in wrong place.

Please refer to below xml snippet

<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns='uri:oozie:workflow:0.4' name='sparkjob'>
    <start to='spark-process' />
    <action name='spark-process'>
        <spark xmlns='uri:oozie:spark-action:0.1'>
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <configuration>
            <property>
                <name>oozie.service.SparkConfigurationService.spark.configurations</name>
                <value>spark.eventLog.dir=hdfs://node1.analytics.sub:8020/user/spark/applicationHistory,spark.yarn.historyServer.address=http://node1.analytics.sub:18088,spark.eventLog.enabled=true</value>
            </property>
            <!--property>
                <name>oozie.hive.defaults</name>
                <value>/user/ambari-qa/sparkActionPython/hive-config.xml</value>
            </property-->
            <!--property>
                <name>oozie.use.system.libpath</name>
                <value>true</value>
            </property-->
            <property>
                <name>oozie.service.WorkflowAppService.system.libpath</name>
                <value>/user/oozie/share/lib/lib_20150831190253/spark</value>
            </property>
        </configuration>
        <master>yarn-client</master>
        <!--master>local[4]</master-->
        <mode>client</mode>
        <name>wordcount</name>
        <jar>/usr/hdp/current/spark-client/AnalyticsJar/wordcount.py</jar>
        <spark-opts>--executor-memory 1G --driver-memory 1G --executor-cores 4 --num-executors 2 --jars /usr/hdp/current/spark-client/lib/spark-assembly-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar</spark-opts>
        </spark>
        <ok to='end'/>
        <error to='spark-fail'/>
    </action>
    <kill name='spark-fail'>
        <message>Spark job failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>

    <end name='end' />
</workflow-app>