2
votes

I am new on spark.

Trying to run spark on yarn in yarn-client mode.

SPARK VERSION = 1.0.2 HADOOP VERSION = 2.2.0

Cluster of yarn has 3 live nodes.

Properties set in spark-env.sh

SPARK_EXECUTOR_MEMORY=1G

SPARK_EXECUTOR_INSTANCES=3

SPARK_EXECUTOR_CORES=1

SPARK_DRIVER_MEMORY=2G

Command used : /bin/spark-shell --master yarn-client

But after logging into spark-shell, it registers only 1 executor with some default mem assign to it.

I confirmed it via spark-web UI as well that it only has 1 executor and that too on the master node ( YARN resource manager node ) only.

INFO yarn.Client: Command for starting the Spark ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx2048m, -Djava.io.tmpdir=$PWD/tmp, -Dspark.tachyonStore.folderName=\"spark-fc6383cc-0904-4af9-8abd-3b66b3f0f461\", -Dspark.yarn.secondary.jars=\"\", -Dspark.home=\"/home/impadmin/spark-1.0.2-bin-hadoop2\", -Dspark.repl.class.uri=\"http://master_node:46823\", -Dspark.driver.host=\"master_node\", -Dspark.app.name=\"Spark shell\", -Dspark.jars=\"\", -Dspark.fileserver.uri=\"http://master_node:46267\", -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"41209\", -Dspark.httpBroadcast.uri=\"http://master_node:36965\", -Dlog4j.configuration=log4j-spark-container.properties, org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused, --jar , null, --args 'master_node:41209' , --executor-memory, 1024, --executor-cores, 1, --num-executors , 3, 1>, /stdout, 2>, /stderr)

...

...

...

14/09/10 22:21:24 INFO cluster.YarnClientSchedulerBackend: Registered executor:

Actor[akka.tcp://sparkExecutor@master_node:53619/user/Executor#1075999905] with ID 1 14/09/10 22:21:24 INFO storage.BlockManagerInfo: Registering block manager master_node:40205 with 589.2 MB RAM 14/09/10 22:21:25 INFO cluster.YarnClientClusterScheduler: YarnClientClusterScheduler.postStartHook done 14/09/10 22:21:25 INFO repl.SparkILoop: Created spark context.. Spark context available as sc.

And after running any spark action with any amount of parallelization, it simply runs all those tasks in series on this node only!!

1

1 Answers

5
votes

ok I solved it this way. I have 4 data nodes on my cluster

spark-shell --num-executors 4 --master yarn-client