When running Spark 1.3.0 Pi example on YARN (Hadoop 2.6.0.2.2.0.0-2041) with the following script:
# Run on a YARN cluster
export HADOOP_CONF_DIR=/etc/hadoop/conf
/var/home2/test/spark/bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--executor-memory 3G \
--num-executors 50 \
/var/home2/test/spark/lib/spark-examples-1.3.0-hadoop2.4.0.jar \
1000
It fails with "Application failed 2 times due to AM Container" message (please see below). As far as I understand, all neccessary information to run Spark application in YARN mode is provided in this launch script. What else should be configured to run on YARN. What is missing? Other reasons for YARN launch to fail?
[test@etl-hdp-mgmt pi]$ ./run-pi.sh
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/04/01 12:59:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/04/01 12:59:58 INFO client.RMProxy: Connecting to ResourceManager at etl-hdp-yarn.foo.bar.com/192.168.0.16:8050
15/04/01 12:59:58 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers
15/04/01 12:59:58 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (4096 MB per container)
15/04/01 12:59:58 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/04/01 12:59:58 INFO yarn.Client: Setting up container launch context for our AM
15/04/01 12:59:58 INFO yarn.Client: Preparing resources for our AM container
15/04/01 12:59:59 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
15/04/01 12:59:59 INFO yarn.Client: Uploading resource file:/var/home2/test/spark-1.3.0-bin-hadoop2.4/lib/spark-assembly-1.3.0-hadoop2.4.0.jar -> hdfs://foo.bar.com:8020/user/test/.sparkStaging/application_1427875242006_0010/spark-assembly-1.3.0-hadoop2.4.0.jar
15/04/01 13:00:01 INFO yarn.Client: Uploading resource file:/var/home2/test/spark/lib/spark-examples-1.3.0-hadoop2.4.0.jar -> hdfs://foo.bar.com:8020/user/test/.sparkStaging/application_1427875242006_0010/spark-examples-1.3.0-hadoop2.4.0.jar
15/04/01 13:00:02 INFO yarn.Client: Setting up the launch environment for our AM container
15/04/01 13:00:03 INFO spark.SecurityManager: Changing view acls to: test
15/04/01 13:00:03 INFO spark.SecurityManager: Changing modify acls to: test
15/04/01 13:00:03 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(test); users with modify permissions: Set(test)
15/04/01 13:00:03 INFO yarn.Client: Submitting application 10 to ResourceManager
15/04/01 13:00:03 INFO impl.YarnClientImpl: Submitted application application_1427875242006_0010
15/04/01 13:00:04 INFO yarn.Client: Application report for application_1427875242006_0010 (state: ACCEPTED)
15/04/01 13:00:04 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1427893202566
final status: UNDEFINED
tracking URL: http://etl-hdp-yarn.foo.bar.com:8088/proxy/application_1427875242006_0010/
user: test
15/04/01 13:00:05 INFO yarn.Client: Application report for application_1427875242006_0010 (state: ACCEPTED)
15/04/01 13:00:06 INFO yarn.Client: Application report for application_1427875242006_0010 (state: ACCEPTED)
15/04/01 13:00:07 INFO yarn.Client: Application report for application_1427875242006_0010 (state: ACCEPTED)
15/04/01 13:00:08 INFO yarn.Client: Application report for application_1427875242006_0010 (state: ACCEPTED)
15/04/01 13:00:09 INFO yarn.Client: Application report for application_1427875242006_0010 (state: FAILED)
15/04/01 13:00:09 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1427875242006_0010 failed 2 times due to AM Container for appattempt_1427875242006_0010_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://etl-hdp-yarn.foo.bar.com:8088/proxy/application_1427875242006_0010/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1427875242006_0010_02_000001
Exit code: 1
Exception message: /mnt/hdfs01/hadoop/yarn/local/usercache/test/appcache/application_1427875242006_0010/container_1427875242006_0010_02_000001/launch_container.sh: line 27: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /mnt/hdfs01/hadoop/yarn/local/usercache/test/appcache/application_1427875242006_0010/container_1427875242006_0010_02_000001/launch_container.sh: line 27: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1427893202566
final status: FAILED
tracking URL: http://etl-hdp-yarn.foo.bar.com:8088/cluster/app/application_1427875242006_0010
user: test
Exception in thread "main" org.apache.spark.SparkException: Application finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:622)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:647)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
– zorkorg.apache.spark.deploy.yarn.ApplicationMaster
should be in the jar thatspark-submit
use to create Spark environmet. I don't think application jar should include this class. – zork