I am trying to run a simple spark job via spark-shell and it looks like BlockManager for the spark-shell listens on localhost instead of configured IP address which causes the spark job to fail. The exception thrown is "Failed to connect to localhost" .
Here is the my configuration:
Machine 1(ubunt64): Spark master [192.168.253.136]
Machine 2(ubuntu64server): Spark Slave [192.168.253.137]
Machine 3(ubuntu64server2): Spark Shell Client[192.168.253.138]
Spark Version: spark-1.3.0-bin-hadoop2.4 Environment: Ubuntu 14.04
Source Code to be executed in Spark Shell:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
var conf = new SparkConf().setMaster("spark://192.168.253.136:7077")
conf.set("spark.driver.host","192.168.253.138")
conf.set("spark.local.ip","192.168.253.138")
sc.stop
var sc = new SparkContext(conf)
val textFile = sc.textFile("README.md")
textFile.count()
The above code just works file if I run it on Machine 2 where the slave is running, but it fails on Machine 1 (Master) and Machine 3(Spark Shell).
Not sure why spark shell listens on a localhost instead of configured IP address. I have set SPARK_LOCAL_IP on Machine 3 using spark-env.sh as well in .bashrc (export SPARK_LOCAL_IP=192.168.253.138). I confirmed that spark shell java program does listen on the port 44015. Not sure why spark shell is broadcasting localhost address.
Any help to troubleshoot this issue will be highly appreciated. Probably I am missing some configuration setting.
Logs:
scala> val textFile = sc.textFile("README.md")
15/04/22 18:15:22 INFO MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=280248975
15/04/22 18:15:22 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 267.1 MB)
15/04/22 18:15:22 INFO MemoryStore: ensureFreeSpace(22692) called with curMem=163705, maxMem=280248975
15/04/22 18:15:22 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.2 KB, free 267.1 MB)
15/04/22 18:15:22 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:44015 (size: 22.2 KB, free: 267.2 MB)
scala> textFile.count()
15/04/22 18:16:07 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (README.md MapPartitionsRDD[1] at textFile at :25)
15/04/22 18:16:07 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/04/22 18:16:08 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ubuntu64server, PROCESS_LOCAL, 1326 bytes)
15/04/22 18:16:23 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, ubuntu64server, PROCESS_LOCAL, 1326 bytes)
15/04/22 18:16:23 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ubuntu64server): java.io.IOException: Failed to connect to localhost/127.0.0.1:44015 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156) at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78) at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) at org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43) at org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)