1
votes

Deployed a Hadoop (Yarn + Spark) cluster on Google Compute Engine with one master & two slaves. When I run the following shell script:

spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 1 --driver-memory 1g --executor-memory 1g --executor-cores 1 /home/hadoop/spark-install/lib/spark-examples-1.1.0-hadoop2.4.0.jar 10

the job just keeps running & every second I get a message similar to this:


15/02/06 22:47:12 INFO yarn.Client: Application report from ResourceManager:
         application identifier: application_1423247324488_0008<br>
         appId: 8<br>
         clientToAMToken: null<br>
         appDiagnostics:<br>
         appMasterHost: hadoop-w-zrem.c.myapp.internal<br>
         appQueue: default<br>
         appMasterRpcPort: 0<br>
         appStartTime: 1423261517468<br>
         yarnAppState: RUNNING<br>
         distributedFinalState: UNDEFINED<br>
         appTrackingUrl: http://hadoop-m-xxxx:8088/proxy/application_1423247324488_0008/<br>
         appUser: achitre
2

2 Answers

0
votes

Instead of --master yarn-cluster use --master yarn-client

0
votes

After adding following line to my script, it worked:

export SPARK_JAVA_OPTS="-Dspark.yarn.executor.memoryOverhead=1024 -Dspark.local.dir=/tmp -Dspark.executor.memory=1024"

I guess, we shouldn't use 'm', 'g' etc when specifying memory; otherwise we get NumberFormatException.