Spark Shell Listens on localhost instead of configured IP address

Question

I am trying to run a simple spark job via spark-shell and it looks like BlockManager for the spark-shell listens on localhost instead of configured IP address which causes the spark job to fail. The exception thrown is "Failed to connect to localhost" .

Here is the my configuration:

Machine 1(ubunt64): Spark master [192.168.253.136]

Machine 2(ubuntu64server): Spark Slave [192.168.253.137]

Machine 3(ubuntu64server2): Spark Shell Client[192.168.253.138]

Spark Version: spark-1.3.0-bin-hadoop2.4 Environment: Ubuntu 14.04

Source Code to be executed in Spark Shell:

    import org.apache.spark.SparkConf
    import org.apache.spark.SparkContext

    var conf = new SparkConf().setMaster("spark://192.168.253.136:7077")
    conf.set("spark.driver.host","192.168.253.138")
    conf.set("spark.local.ip","192.168.253.138")
    sc.stop
    var sc = new SparkContext(conf)
    val textFile = sc.textFile("README.md")
    textFile.count()

The above code just works file if I run it on Machine 2 where the slave is running, but it fails on Machine 1 (Master) and Machine 3(Spark Shell).

Not sure why spark shell listens on a localhost instead of configured IP address. I have set SPARK_LOCAL_IP on Machine 3 using spark-env.sh as well in .bashrc (export SPARK_LOCAL_IP=192.168.253.138). I confirmed that spark shell java program does listen on the port 44015. Not sure why spark shell is broadcasting localhost address.

Any help to troubleshoot this issue will be highly appreciated. Probably I am missing some configuration setting.

Logs:

scala> val textFile = sc.textFile("README.md")

15/04/22 18:15:22 INFO MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=280248975

15/04/22 18:15:22 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 267.1 MB)

15/04/22 18:15:22 INFO MemoryStore: ensureFreeSpace(22692) called with curMem=163705, maxMem=280248975

15/04/22 18:15:22 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.2 KB, free 267.1 MB)

15/04/22 18:15:22 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:44015 (size: 22.2 KB, free: 267.2 MB)

scala> textFile.count()

15/04/22 18:16:07 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (README.md MapPartitionsRDD[1] at textFile at :25)

15/04/22 18:16:07 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks

15/04/22 18:16:08 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ubuntu64server, PROCESS_LOCAL, 1326 bytes)

15/04/22 18:16:23 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, ubuntu64server, PROCESS_LOCAL, 1326 bytes)

15/04/22 18:16:23 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ubuntu64server): java.io.IOException: Failed to connect to localhost/127.0.0.1:44015 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156) at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78) at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) at org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43) at org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

Thanks @hnahak. Contents of /etc/hosts on ubuntu64server2(spark-shell host). 192.168.253.138 ubuntu64server2 192.168.253.136 ubuntu64 192.168.253.137 ubuntu64server #127.0.0.1 localhost Still getting the same error: 15/04/23 05:52:29 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ubuntu64server): java.io.IOException: Failed to connect to localhost/127.0.0.1:54083 I can see the ubuntu64server2 is listening on port 54083, but on host 0.0.0.0 tcp 0 0 0.0.0.0:54083 0.0.0.0:* LISTEN — dxrodri

dxrodri dxrodri · Accepted Answer · 2015-04-25T15:13:57

Found a work-around for this BlockManager localhost issue by providing spark master address at shell initiation (or can bein spark-defaults.conf).

./spark-shell --master spark://192.168.253.136:7077

This way, I didn't have to stop the spark context and the original context was able to read files as well as read data from Cassandra tables.

Here is the log of BlockManager listening on localhost (stop and dynamic creation of context) which fails with "Failed to connect exception"

15/04/25 07:10:27 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:40235 (size: 1966.0 B, free: 267.2 MB)

compare to listening on actual server name (if spark master provided at command line) which works

15/04/25 07:12:47 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ubuntu64server2:33301 (size: 1966.0 B, free: 267.2 MB)

Looks like a bug in BlockManager code when context is dynamically created in the shell.

Hope this helps someone.

Spark Shell Listens on localhost instead of configured IP address

1 Answers