apache spark yarn cluster

Question

I am trying to run a spark stand alone appliaction in yarn-client mode(with out spark-submit). I gave spark-assembly-1.1.0-hadoop2.4.0.jar and hadoop conf (yarn-site.xml) in the class path but yarn is not picking ResourceManager url from the yarn-site.xml instead it defaults to 8032 port.

Thank you

DarkZero DarkZero · Accepted Answer · 2015-09-01T05:41:06

Please note that if you have added more than one path in your classpath, you have to make HADOOP_CONF_DIR the first one.

This is my application's boot script:

set HADOOP_CONF_DIR=D:\data\yarnv2_5\Hadoop\etc\hadoop
set PATH=%PATH%;D:\data\AppDependencies\jdk1.8\bin
set JAVA_HOME=D:\data\AppDependencies\jdk1.8
set AKKA_HOME=%~dp0
set JAVA_OPTS=-Xmx100g -Xms1024M -Xss1M -XX:MaxPermSize=256M -XX:+UseParallelGC -Dfile.encoding=UTF8
set AKKA_CLASSPATH=%AKKA_HOME%\*

rem The order matters! Be sure to put HADOOP_CONF_DIR in the first place.
set APP_CLASSPATH = %HADOOP_CONF_DIR%;%AKKA_CLASSPATH%

java %JAVA_OPTS% -cp "%IN4_CLASSPATH%"  com.Application

If you pick more than one paths in classpath, Hadoop(or more precisely, JVM) will stop once it finds the config file. The first path tends to be your current path, where it will find the config file in your spark-assembly-1.x.x-hadoop-2.x.x.jar, and your config in HADOOP_CONF_DIR will show no effect.

apache spark yarn cluster

1 Answers