apache-spark 1.3.0 and yarn integration and spring-boot as a container

Question

I was running spark application as a query service (much like spark-shell but within servlet container of spring-boot) with spark 1.0.2 and standalone mode. Now After upgrading to spark 1.3.1 and trying to use Yarn instead of standalone cluster things going south for me. I created uber jar with all dependencies (spark-core, spark-yarn, spring-boot) and tried to deploy my application.

15/07/29 11:19:26 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

15/07/29 11:19:27 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/07/29 11:19:28 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/07/29 11:19:29 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

I also tried to exclude spark-yarn dependencies and supplied it during runtime but same exception. We use MapR distribution and they said it's not possible to run spark jobs on yarn without using spark-submit script. I can try to launch my webapp using that script as my build artifact is spring-boot jar (not war) but that just doesn't feel right. I should be able to init service from my container not other way around.

EDIT 1: how I launch my application: I launch it from a machine where hadoop client is installed and configured.

java -cp myspringbootapp.jar com.myapp.Application

com.myapp.Application in turns creates SparkContext as a spring managed bean. That I use later to serve user requests.

Looks like Yarn is not right configured. How do you launch your job? — Junayy

nir nir · Accepted Answer · 2016-02-19T12:39:44

I did got it working with few steps: 1) Exclude hadoop jars from uber jar (spring-boot maven plugin gives you uber jar by default and there you need to make exclusion) 2) use ZIP layout with spring boot maven plugin that allows you to use loader.path spring configuration to provide extra classpath during runtime. 3) use java -loader.path='/path/to/hadoop/jar,/path/to/hadoop/conf/' -jar myapp.jar

PS - error i was getting was due to hadoop jar being on classpath without proper configuration files. by default hadoop jar is packed with yarn-default.xml which tries to locate your resource manager at 0.0.0.0/0.0.0.0:8032. You can still try packing hadoop jar but be sure to provide path to your custom hadoop conf. i.e. yarn-site.xml which has proper setting for your resource manager host, port, ha etc.

apache-spark 1.3.0 and yarn integration and spring-boot as a container

1 Answers