2
votes

I am using ubuntu 16 and trying to set up spark cluster on my lan.

I have managed to configure a spark master, and manage to connect a slave from the same machine and see it on localhost:8080

When i try to connect from another machine, problems start, i configured passwordless ssh as explained here

when i try to connect to the master using start-slave.sh spark://master:port as explained here

I am getting this error log

I tried accesing the master using the local ip and the local name (i manage to ssh to the master using both and without password. both to the user and to root)

I tried port 6066 and port 7077 on both

I don't get error massage but the new slave is not apearing in the master's localhost:8080 page

And keep getting this error log

Spark Command: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /usr/local/spark/conf/:/usr/local/spark/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://latitude:6066 ======================================== Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 17/07/26 22:09:09 INFO Worker: Started daemon with process name: 20609@name-beckup-laptop 17/07/26 22:09:09 INFO SignalUtils: Registered signal handler for TERM 17/07/26 22:09:09 INFO SignalUtils: Registered signal handler for HUP 17/07/26 22:09:09 INFO SignalUtils: Registered signal handler for INT 17/07/26 22:09:09 WARN Utils: Your hostname, name-beckup-laptop resolves to a loopback address: 127.0.1.1; using 192.168.14.84 instead (on interface wlp2s0) 17/07/26 22:09:09 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 17/07/26 22:09:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/07/26 22:09:09 INFO SecurityManager: Changing view acls to: name 17/07/26 22:09:09 INFO SecurityManager: Changing modify acls to: name 17/07/26 22:09:09 INFO SecurityManager: Changing view acls groups to: 17/07/26 22:09:09 INFO SecurityManager: Changing modify acls groups to: 17/07/26 22:09:09 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(name); groups with view permissions: Set(); users with modify permissions: Set(name); groups with modify permissions: Set() 17/07/26 22:09:09 INFO Utils: Successfully started service 'sparkWorker' on port 34777. 17/07/26 22:09:09 INFO Worker: Starting Spark worker 192.168.14.84:34777 with 4 cores, 14.6 GB RAM 17/07/26 22:09:09 INFO Worker: Running Spark version 2.2.0 17/07/26 22:09:09 INFO Worker: Spark home: /usr/local/spark 17/07/26 22:09:10 INFO Utils: Successfully started service 'WorkerUI' on port 8081. 17/07/26 22:09:10 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://192.168.14.84:8081 17/07/26 22:09:10 INFO Worker: Connecting to master latitude:6066... 17/07/26 22:09:10 WARN Worker: Failed to connect to master latitude:6066 org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100) at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108) at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:241) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Failed to connect to latitude/192.168.14.83:6066 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182) at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190) ... 4 more Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: latitude/192.168.14.83:6066 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:257) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:291) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:631) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) ... 1 more

Thanks!

2

2 Answers

1
votes

Found the problem!

you need to add a file in /conf/spark-env

There add the following:

SPARK_MASTER_IP='<ip of master without port>'

and then

start-master.sh -h <ip of master>:7077

after that

start-slave.sh spark://<master ip>:7077 

will work like a charm.

0
votes

I have the same problem, running spark/sbin/start-slave.sh on master node.

hadoop@master:/opt/spark$ sudo ./sbin/start-slave.sh --master spark://master:7077
starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-master.out
failed to launch: nice -n 0 /opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 --master spark://master:7077
  Options:
    -c CORES, --cores CORES  Number of cores to use
    -m MEM, --memory MEM     Amount of memory to use (e.g. 1000M, 2G)
    -d DIR, --work-dir DIR   Directory to run apps in (default: SPARK_HOME/work)
    -i HOST, --ip IP         Hostname to listen on (deprecated, please use --host or -h)
    -h HOST, --host HOST     Hostname to listen on
    -p PORT, --port PORT     Port to listen on (default: random)
    --webui-port PORT        Port for web UI (default: 8081)
    --properties-file FILE   Path to a custom Spark properties file.
                             Default is conf/spark-defaults.conf.
full log in /opt/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-master.out

I found my fault, I should not use --master keyword and just run command

hadoop@master:/opt/spark$ sudo ./sbin/start-slave.sh spark://master:7077

following the steps of this tutorial: https://phoenixnap.com/kb/install-spark-on-ubuntu

In addition, my configuration of /opt/spark/conf/spark-env.sh is as follows:

SPARK_MASTER_HOST="master"
JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64"

which is master is hostname of my server specified in /etc/hosts