3
votes

The spark master log includes the following:

15/05/19 21:05:19 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:7077]

But the Worker is not able to connect:

15/05/19 21:27:13 INFO Worker: Connecting to master akka.tcp://[email protected]:7077/user/Master... 15/05/19 21:27:13 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://[email protected]:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: mellyrn.local/25.101.19.24:7077 15/05/19 21:27:25 INFO Worker: Retrying connection to master (attempt # 1) 15/05/19 21:27:25 INFO Worker: Connecting to master akka.tcp://[email protected]:7077/user/Master... 15/05/19 21:27:25 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://[email protected]:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: mellyrn.local/25.101.19.24:7077

Any hints what to try here?

2
This may happen if hostname entry is wrong, ie worker is not able to resolve IP->hostnameayan guha

2 Answers

1
votes

Check your file conf/spark-defaults.conf, set spark.master to spark://mellyrn.local:7077

Also, remember to configure ssh access without password.

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

1
votes

It appears these errors were intermittent - and due to the host machine was completely out of memory at the time. After shutting down some unrelated memory hogging processes the above errors mostly went away.

There still is a delay in achieving the Master/Worker association on order of a few tens of seconds that I would like to understand.

Note that there were not any log messages describing the low memory situation.