Failed to run Apache Spark without Hadoop

Question

I installed the package spark-2.0.2-bin-without-hadoop.tgz on a local DEV box but failed to run it below,

$ ./bin/spark-shell
NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

$ ./sbin/start-master.sh
NoClassDefFoundError: org/slf4j/Logger

Did I misinterpret that Spark could spin without Hadoop below?

"Do I need Hadoop to run Spark? No, but if you run on a cluster, you will need some form of shared file system (for example, NFS mounted at the same path on each node). If you have this type of filesystem, you can just deploy Spark in standalone mode."

Denny Lee Denny Lee · Accepted Answer · 2016-11-23T03:22:29

For the first issue concerning FSDataInputStream, as noted in this Stack Overflow response https://stackoverflow.com/a/31331528,

the "without Hadoop" is a bit misleading in that this build of Spark is not tied to a specific build of Hadoop as opposed to not running without it. To run Spark using the "without Hadoop" version, you should bind it to your own Hadoop distribution.

For the second issue concerning missing SLF4J, as noted in this Stack Overflow response https://stackoverflow.com/a/39277696 - you can include the SLF4J jar or if you already have a Hadoop distribution installed, then you should already have this up and running.

Saying this, you can download the Apache Spark pre-built with Hadoop and not use Hadoop itself. It contains all the necessary jars and you can specify Spark to read from the file system, e.g. Using the file://// when accessing your data (instead of HDFS).

Failed to run Apache Spark without Hadoop

2 Answers