2
votes

I'm setting up a [somewhat ad-hoc] cluster of Spark workers: namely, a couple of lab machines that I have sitting around. However, I've run into a problem when I attempt to start the cluster with start-all.sh: namely, Spark is installed in different directories on the various workers. But the master invokes $SPARK_HOME/sbin/start-all.sh on each one using the master's definition of $SPARK_HOME, even though the path is different for each worker.

Assuming I can't install Spark on identical paths on each worker to the master, how can I get the master to recognize the different worker paths?

EDIT #1 Hmm, found this thread in the Spark mailing list, strongly suggesting that this is the current implementation--assuming $SPARK_HOME is the same for all workers.

2
Would you mind taking a look at my reply to this mailing list thread? I have a question about a configuring different log4j.properties per worker that I can't seem to overcome. This isn't what I'd use in reality, but for mucking around and understanding what's going on it would be of helpBrad

2 Answers

0
votes

I'm playing around with Spark on Windows (my laptop) and have two worker nodes running by starting them manually using a script that contains the following

set SPARK_HOME=C:\dev\programs\spark-1.2.0-worker1
set SPARK_MASTER_IP=master.brad.com 
spark-class org.apache.spark.deploy.worker.Worker spark://master.brad.com:7077 

I then create a copy of this script with a different SPARK_HOME defined to run my second worker from. When I kick off a spark-submit I see this on Worker_1

15/02/13 16:42:10 INFO ExecutorRunner: Launch command: ...C:\dev\programs\spark-1.2.0-worker1\bin...

and this on Worker_2

15/02/13 16:42:10 INFO ExecutorRunner: Launch command: ...C:\dev\programs\spark-1.2.0-worker2\bin...

So it works, and in my case I duplicated the spark installation directory, but you may be able to get around this

0
votes

You might want to consider assign the name by changing SPARK_WORKER_DIR line in the spark-env.sh file.