Unable to run Spark 1.0 SparkPi on HDP 2.0

Question

I stuck in issue with runing spark PI example on HDP 2.0

I downloaded spark 1.0 pre-build from http://spark.apache.org/downloads.html (for HDP2) The run example from spark web-site:

 ./bin/spark-submit --class org.apache.spark.examples.SparkPi     --master yarn-cluster --num-executors 3 --driver-memory 2g --executor-memory 2g --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2

I got error:

Application application_1404470405736_0044 failed 3 times due to AM Container for appattempt_1404470405736_0044_000003 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) .Failing this attempt.. Failing the application.

Unknown/unsupported param List(--executor-memory, 2048, --executor-cores, 1, --num-executors, 3) Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options] Options:
--jar JAR_PATH Path to your application's JAR file (required) --class CLASS_NAME Name of your application's main class (required) ...bla-bla-bla

any ideas? how can I make it works?

I think it's pretty obvious that you aren't passing the parameters correctly, Unknown/unsupported param List(--executor-memory, 2048, --executor-cores, 1, --num-executors, 3) I recommend looking at the Options that you cut short with ...bla-bla-bla — aaronman

Павел Мезенцев Павел Мезенцев · Accepted Answer · 2014-08-19T04:44:24

I had the same problem. The reason was that version of spark-assembly.jar, in the hdfs differs from your current spark version.

For example params list of org.apache.spark.deploy.yarn.Client in hdfs version:

  $ hadoop jar ./spark-assembly.jar  org.apache.spark.deploy.yarn.Client --help
Usage: org.apache.spark.deploy.yarn.Client [options] 
Options:
  --jar JAR_PATH             Path to your application's JAR file (required in yarn-cluster mode)
  --class CLASS_NAME         Name of your application's main class (required)
  --args ARGS                Arguments to be passed to your application's main class.
                             Mutliple invocations are possible, each will be passed in order.
  --num-workers NUM          Number of workers to start (Default: 2)
  --worker-cores NUM         Number of cores for the workers (Default: 1). This is unsused right now.
  --master-class CLASS_NAME  Class Name for Master (Default: spark.deploy.yarn.ApplicationMaster)
  --master-memory MEM        Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
  --worker-memory MEM        Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
  --name NAME                The name of your application (Default: Spark)
  --queue QUEUE              The hadoop queue to use for allocation requests (Default: 'default')
  --addJars jars             Comma separated list of local jars that want SparkContext.addJar to work with.
  --files files              Comma separated list of files to be distributed with the job.
  --archives archives        Comma separated list of archives to be distributed with the job.

And the same help for latest installed spark-assembly jar file:

$ hadoop jar ./spark-assembly-1.0.0-cdh5.1.0-hadoop2.3.0-cdh5.1.0.jar org.apache.spark.deploy.yarn.Client
Usage: org.apache.spark.deploy.yarn.Client [options] 
Options:
  --jar JAR_PATH             Path to your application's JAR file (required in yarn-cluster mode)
  --class CLASS_NAME         Name of your application's main class (required)
  --arg ARGS                 Argument to be passed to your application's main class.
                             Multiple invocations are possible, each will be passed in order.
  --num-executors NUM        Number of executors to start (Default: 2)
  --executor-cores NUM       Number of cores for the executors (Default: 1).
  --driver-memory MEM        Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb)
  --executor-memory MEM      Memory per executor (e.g. 1000M, 2G) (Default: 1G)
  --name NAME                The name of your application (Default: Spark)
  --queue QUEUE              The hadoop queue to use for allocation requests (Default: 'default')
  --addJars jars             Comma separated list of local jars that want SparkContext.addJar to work with.
  --files files              Comma separated list of files to be distributed with the job.
  --archives archives        Comma separated list of archives to be distributed with the job.

so, I updated my spark-assembly.jar into hdfs and spark started working well

Unable to run Spark 1.0 SparkPi on HDP 2.0

1 Answers