I have Spark and Hadoop installed on OS X. I successfully worked through an example where Hadoop ran locally, had files stored in HDFS and I ran spark with
spark-shell --master yarn-client
and from within the shell worked with HDFS. I'm having problems, however, trying to get Spark to run without HDFS, just locally on my machine. I looked at this answer but it doesn't feel right messing around with environment variables when the Spark documentation says
It’s easy to run locally on one machine — all you need is to have java installed on your system PATH, or the JAVA_HOME environment variable pointing to a Java installation.
If I run the basic SparkPi example I get the correct output.
If I try run the sample Java app they provide, again, I get output, but this time with connection refused errors relating to port 9000, which sounds like it's trying to connect to Hadoop, but I don't know why because I'm not specifying that
$SPARK_HOME/bin/spark-submit --class "SimpleApp" --master local[4] ~/study/scala/sampleJavaApp/target/simple-project-1.0.jar
Exception in thread "main" java.net.ConnectException: Call From 37-2-37-10.tssg.org/10.37.2.37 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
...
...
...
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
at org.apache.hadoop.ipc.Client$Connection.access(Client.java:367)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
at org.apache.hadoop.ipc.Client.call(Client.java:1381)
... 51 more
15/07/31 11:05:06 INFO spark.SparkContext: Invoking stop() from shutdown hook
15/07/31 11:05:06 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
...
...
...
15/07/31 11:05:06 INFO ui.SparkUI: Stopped Spark web UI at http://10.37.2.37:4040
15/07/31 11:05:06 INFO scheduler.DAGScheduler: Stopping DAGScheduler
15/07/31 11:05:06 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
15/07/31 11:05:06 INFO util.Utils: path = /private/var/folders/cg/vkq1ghks37lbflpdg0grq7f80000gn/T/spark-c6ba18f5-17a5-4da9-864c-509ec855cadf/blockmgr-b66cc31e-7371-472f-9886-4cd33d5ba4b1, already present as root for deletion.
15/07/31 11:05:06 INFO storage.MemoryStore: MemoryStore cleared
15/07/31 11:05:06 INFO storage.BlockManager: BlockManager stopped
15/07/31 11:05:06 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
15/07/31 11:05:06 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
15/07/31 11:05:06 INFO spark.SparkContext: Successfully stopped SparkContext
15/07/31 11:05:06 INFO util.Utils: Shutdown hook called
15/07/31 11:05:06 INFO util.Utils: Deleting directory /private/var/folders/cg/vkq1ghks37lbflpdg0grq7f80000gn/T/spark-c6ba18f5-17a5-4da9-864c-509ec855cadf
Any pointers/explanations as to where I'm going wrong would be much appreciated!
UPDATE
It seems that the fact I have the environment variable HADOOP_CONF_DIR set is causing some issues. Under that directory, I have core-site.xml which contains the following
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
If I change the value e.g. <value>hdfs://localhost:9100</value> then when I attempt to run the spark job, the connection refused error refers to this changed port
Exception in thread "main" java.net.ConnectException: Call From 37-2-37-10.tssg.org/10.37.2.37 to localhost:9100 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
So for some reason, despite instructing it to run locally, it is trying to connect to HDFS. If I remove the HADOOP_CONF_DIR environment variable, the job works fine.
$SPARK_HOME/bin/spark-submit --class "SimpleApp" --master local[4] ~/study/scala/sampleJavaApp/target/simple-project-1.0.jar- Philip O'Brien