In my spark-env.sh I have these settings:
SPARK_LOCAL_IP=127.0.0.1
SPARK_MASTER_HOST=127.0.0.1
SPARK_WORKER_INSTANCES=2
SPARK_WORKER_MEMORY=1000m
SPARK_WORKER_CORES=1
I start the master using start-master.sh
and then I start the slaves/workers using start-slave.sh spark://localhost:7077
The mater web UI is showing fine but it shows only ONE worker started.
This is the log of the first worker (which is working fine):
Spark Command: /usr/lib/jvm/java-8-openjdk-amd64/bin/java -cp /media/ahmedn1/Ahmedn12/spark/conf/:/media/ahmedn1/Ahmedn12/spark/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://localhost:7077
17/08/30 12:19:31 INFO Worker: Started daemon with process name: 28769@ahmedn1-Inspiron-5555
17/08/30 12:19:31 INFO SignalUtils: Registered signal handler for TERM
17/08/30 12:19:31 INFO SignalUtils: Registered signal handler for HUP
17/08/30 12:19:31 INFO SignalUtils: Registered signal handler for INT
17/08/30 12:19:33 INFO SecurityManager: Changing view acls to: ahmedn1
17/08/30 12:19:33 INFO SecurityManager: Changing modify acls to: ahmedn1
17/08/30 12:19:33 INFO SecurityManager: Changing view acls groups to:
17/08/30 12:19:33 INFO SecurityManager: Changing modify acls groups to:
17/08/30 12:19:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ahmedn1); groups with view permissions: Set(); users with modify permissions: Set(ahmedn1); groups with modify permissions: Set()
17/08/30 12:19:34 INFO Utils: Successfully started service 'sparkWorker' on port 43479.
17/08/30 12:19:35 INFO Worker: Starting Spark worker 127.0.0.1:43479 with 2 cores, 1000.0 MB RAM
17/08/30 12:19:35 INFO Worker: Running Spark version 2.2.0
17/08/30 12:19:35 INFO Worker: Spark home: /media/ahmedn1/Ahmedn12/spark
17/08/30 12:19:35 INFO ExternalShuffleService: Starting shuffle service on port 7337 (auth enabled = false)
17/08/30 12:19:35 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
17/08/30 12:19:35 INFO WorkerWebUI: Bound WorkerWebUI to 127.0.0.1, and started at http://127.0.0.1:8081
17/08/30 12:19:35 INFO Worker: Connecting to master localhost:7077...
17/08/30 12:19:36 INFO TransportClientFactory: Successfully created connection to localhost/127.0.0.1:7077 after 309 ms (0 ms spent in bootstraps)
17/08/30 12:19:37 INFO Worker: Successfully registered with master spark://127.0.0.1:7077
and this is the log of the second worker which apparently failed to start:
Spark Command: /usr/lib/jvm/java-8-openjdk-amd64/bin/java -cp /media/ahmedn1/Ahmedn12/spark/conf/:/media/ahmedn1/Ahmedn12/spark/jars/*
-Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8082 spark://localhost:7077
17/08/30 12:19:34 INFO Worker: Started daemon with process name: 28819@ahmedn1-Inspiron-5555
17/08/30 12:19:34 INFO SignalUtils: Registered signal handler for TERM
17/08/30 12:19:34 INFO SignalUtils: Registered signal handler for HUP
17/08/30 12:19:34 INFO SignalUtils: Registered signal handler for INT
17/08/30 12:19:36 INFO SecurityManager: Changing view acls to: ahmedn1
17/08/30 12:19:36 INFO SecurityManager: Changing modify acls to: ahmedn1
17/08/30 12:19:36 INFO SecurityManager: Changing view acls groups to:
17/08/30 12:19:36 INFO SecurityManager: Changing modify acls groups to:
17/08/30 12:19:36 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ahmedn1); groups with view permissions: Set(); users with modify permissions: Set(ahmedn1); groups with modify permissions: Set()
17/08/30 12:19:37 INFO Utils: Successfully started service 'sparkWorker' on port 46067.
17/08/30 12:19:38 INFO Worker: Starting Spark worker 127.0.0.1:46067 with 2 cores, 1000.0 MB RAM
17/08/30 12:19:38 INFO Worker: Running Spark version 2.2.0
17/08/30 12:19:38 INFO Worker: Spark home: /media/ahmedn1/Ahmedn12/spark
17/08/30 12:19:38 INFO ExternalShuffleService: Starting shuffle service on port 7337 (auth enabled = false)
17/08/30 12:19:38 ERROR Inbox: Ignoring error java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:127) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:501) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1218) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:496) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:481) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:965) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:210) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:353) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) at java.lang.Thread.run(Thread.java:748)
So, I see the problem is in Address Binding which might be related to ports. But isn't it supposed to automatically select a free port?