3
votes

I tried to start spark-shell with:

spark-shell --master yarn-client

Then I get into the shell. But a few seconds later, I got this in the shell:

 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:38171] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].

I got this error repeated many times in the yarn log file.

15/02/23 20:37:26 INFO yarn.YarnAllocationHandler: Completed container container_1424684000430_0001_02_000002 (state: COMPLETE, exit status: 1) 15/02/23 20:37:26 INFO yarn.YarnAllocationHandler: Container marked as failed: container_1424684000430_0001_02_000002. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1424684000430_0001_02_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1

I also noticed this line:

15/02/23 21:00:20 INFO yarn.ExecutorRunnable: Setting up executor with commands: List($JAVA_HOME/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms1024m -Xmx1024m , -Djava.io.tmpdir=$PWD/tmp, '-Dspark.driver.port=33837', -Dspark.yarn.app.container.log.dir=<LOG_DIR>, org.apache.spark.executor.CoarseGrainedExecutorBackend, akka.tcp://[email protected]:33837/user/CoarseGrainedScheduler, 4, vbox-lubuntu, 1, application_1424684000430_0003, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)

The strange part is -Dspark.yarn.app.container.log.dir=. It looks like the variable is not getting expanded. But I think I have defined it already.

P.S. spark-submit seems to be working:

spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster /path/to/lib/spark-examples-1.2.1-hadoop2.4.0.jar
1

1 Answers

1
votes

Based on the discussion in this thread, the problem is caused by OOM in the container. The only solution is to raise the system memory...

The error message is really miss leading.