I am trying to setup a Spark standalone cluster following the official documentation.
My master is on a local vm running ubuntu and I also have one worker running in the same machine. It is connecting and I am able to see its status in the WebUI of the master.
Here is the WebUi image -
But when I try to connect a slave from another machine, I am not able to do it.
This is the log message I get in the worker when I start from another machine.
I have tried using start-slaves.sh
from the master after updating conf\slaves and also start-slave.sh spark://spark:7077
from the slave.
[Master hostname - spark; Worker hostanme - worker]
15/07/01 11:54:16 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster@spark:7077] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://sparkMaster@spark:7077]].
15/07/01 11:54:59 ERROR Worker: All masters are unresponsive! Giving up.
15/07/01 11:54:59 INFO Utils: Shutdown hook called
When I try to telnet from the slave to the master, this is what I get -
root@worker:~# telnet spark 7077
Trying 10.xx.xx.xx...
Connected to spark.
Escape character is '^]'.
Connection closed by foreign host.
Telnet seems to work but the connection is closed as soon as it is established. Could this have something to do with the problem ?
I have added the master and slave IP addresses in /etc/hosts on both machines. I followed all the solutions given at SPARK + Standalone Cluster: Cannot start worker from another machine but they have not worked for me.
I have the following config set in spark-env.sh in both machines -
export SPARK_MASTER_IP=spark
export SPARK_WORKER_PORT=44444
Any help is greatly appreciated.