0
votes

I'm working with Windows and trying to set up Spark.

Previously I installed Hadoop in addition to Spark, edited the config files, run the hadoop namenode -format and away we went.

I'm now trying to achieve the same by using the bundled version of Spark that is pre built with hadoop - spark-1.6.1-bin-hadoop2.6.tgz

So far it's been a much cleaner, simpler process however I no longer have access to the command that creates the hdfs, the config files for the hdfs are no longer present and I've no 'hadoop' in any of the bin folders.

There wasn't an Hadoop folder in the spark install, I created one for the purpose of winutils.exe.

It feels like I've missed something. Do the pre-built versions of spark not include hadoop? Is this functionality missing from this variant or is there something else that I'm overlooking?

Thanks for any help.

2
Spark is not prebuilt with Hadoop, it is prebuilt with the client libraries for accessing Hadoop. You should install Hadoop separately from Spark.. - mgaido
@mark91 - I guess that's the bit I was missing then :) thank you - null
@mark91 hope you don't mind but would you be able to elaborate some and place it in an answer please? When you say for 'accessing hadoop', do you mean within a spark application or..? - null

2 Answers

1
votes

By saying that Spark is built with Hadoop, it is meant that Spark is built with the dependencies of Hadoop, i.e. with the clients for accessing Hadoop (or HDFS, to be more precise).

Thus, if you use a version of Spark which is built for Hadoop 2.6 you will be able to access HDFS filesystem of a cluster with the version 2.6 of Hadoop via Spark.

It doesn't mean that Hadoop is part of the pakage and downloading it Hadoop is installed as well. You have to install Hadoop separately.

If you download a Spark release without Hadoop support, you'll need to include the Hadoop client libraries in all the applications you write wiƬhich are supposed to access HDFS (by a textFile for instance).

0
votes

I am also using same spark in my windows 10. What I have done create C:\winutils\bin directory and put winutils.exe there. Than create HADOOP_HOME=C:\winutils variable. If you have set all env variables and PATH like SPARK_HOME,HADOOP_HOME etc than it should work.