22
votes

I am trying to setup a Spark standalone cluster following the official documentation.

My master is on a local vm running ubuntu and I also have one worker running in the same machine. It is connecting and I am able to see its status in the WebUI of the master.

Here is the WebUi image -

enter image description here

But when I try to connect a slave from another machine, I am not able to do it.

This is the log message I get in the worker when I start from another machine. I have tried using start-slaves.sh from the master after updating conf\slaves and also start-slave.sh spark://spark:7077 from the slave.

[Master hostname - spark; Worker hostanme - worker]

15/07/01 11:54:16 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster@spark:7077] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://sparkMaster@spark:7077]].
15/07/01 11:54:59 ERROR Worker: All masters are unresponsive! Giving up.
15/07/01 11:54:59 INFO Utils: Shutdown hook called

When I try to telnet from the slave to the master, this is what I get -

root@worker:~# telnet spark 7077
Trying 10.xx.xx.xx...
Connected to spark.
Escape character is '^]'.
Connection closed by foreign host.

Telnet seems to work but the connection is closed as soon as it is established. Could this have something to do with the problem ?

I have added the master and slave IP addresses in /etc/hosts on both machines. I followed all the solutions given at SPARK + Standalone Cluster: Cannot start worker from another machine but they have not worked for me.

I have the following config set in spark-env.sh in both machines -

export SPARK_MASTER_IP=spark

export SPARK_WORKER_PORT=44444

Any help is greatly appreciated.

5

5 Answers

28
votes

I encounter the exact same problem as you and just figure out how to get it to work.

The problem is that your spark master is listening on hostname, in your example spark, which causes the worker on the same host being able to register successfully but failed from another machine with command start-slave.sh spark://spark:7077.

The solution is to make sure the value SPARK_MASTER_IP is specified with ip in file conf/spark-env.sh

    SPARK_MASTER_IP=<your host ip>

on your master node, and start your spark master as normal. You can open your web GUI to make sure your spark master appears as spark://YOUR_HOST_IP:7077 after the start. Then, on another machine with command start-slave.sh spark://<your host ip>:7077 should start and register worker to master successfully.

Hope it would help you

14
votes

Its depends on your spark version, it will need different conf. if your spark version 1.6 add this line to conf/spark-env.shso another machine can connect to master

SPARK_MASTER_IP=your_host_ip

and if your spark version is 2.x add these lines to your conf/spark-env.sh

SPARK_MASTER_HOST=your_host_ip

SPARK_LOCAL_IP=your_host_ip

after adding these lines run spark :

./sbin/spark-all.sh

and if you do right , you can see in <your_host_ip>:8080 that spark master url is:spark://<your_host_ip>:7077

BeCarefule your_host_ip ,shouldnt be localhost and It must be exactly Your host ip that you set in conf/spark-env.sh

after all you can connect another machine to the master with command below:

./sbin/start-slave.sh spark://your_host_ip:7077

1
votes

I just launched my own spark cluster with version 2.10. The way I solved my problem is the following:

 ./sbin/start-master.sh -h <your_ip>
0
votes

All solutions above didn't work for me, but I found one more way to fix problem: Stackoverflow: spark worker not connecting to master

Please check configuration file "spark-env.sh" on your master node. Have you set the SPARK_MASTER_HOST variable to the IP address of the master node? If not try to set it and restart the master and slaves. For example, if your master node's IP is 192.168.0.1, you should have SPARK_MASTER_HOST=192.168.0.1 in there. Note that you don't need to set this variable on your slaves.

0
votes
  1. Under the spark/conf directory, open the file spark-defaults.conf.template to edit.

  2. Add the following line:

    spark.master spark://your hostname:7077 
    

To find your hostname, type hostname in your command prompt.