11
votes

I'm very new to the concepts of Big Data and related areas, sorry if I've made some mistake or typo.

I would like to understand Apache Spark and use it only in my computer, in a development / test environment. As Hadoop include HDFS (Hadoop Distributed File System) and other softwares that only matters to distributed systems, can I discard that? If so, where can I download a version of Spark that doesn't need Hadoop? Here I can find only Hadoop dependent versions.

What do I need:

  • Run all features from Spark without problems, but in a single computer (my home computer).
  • Everything that I made in my computer with Spark should run in a future cluster without problems.

There's reason to use Hadoop or any other distributed file system for Spark if I will run it on my computer for testing purposes?

Note that "Can apache spark run without hadoop?" is a different question from mine, because I do want run Spark in a development environment.

2
Spark works with the native file system using Hadoop utilities, so you can just grab it and use it. Did you give it a try and it didn't work? - Justin Pihony
Can you send me the link of this Spark version? Also, I have made some mistakes when I read the Spark documentation, will edit the question now. - Paladini
Just go to the main site and download it with the Hadoop distro. - Justin Pihony
@JustinPihony I can't use Hadoop right now, my Spark with Hadoop isn't compiling. There's no version without Hadoop? - Paladini
That sounds like a different problem though, why isn't it compiling? - Justin Pihony

2 Answers

13
votes

Yes you can install Spark without Hadoop. Go through Spark official documentation :http://spark.apache.org/docs/latest/spark-standalone.html

Rough steps :

  1. Download precomplied spark or download spark source and build locally
  2. extract TAR
  3. Set required environment variable
  4. Run start script .

Spark(without Hadoop) - Available on Spark Download page URL : https://www.apache.org/dyn/closer.lua/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz

If this url do not work then try to get it from Spark download page

0
votes

This is not a proper answer to original question. Sorry, It is my fault.


If someone want to run spark without hadoop distribution tar.gz.

there should be environment variable to set. this spark-env.sh worked for me.

#!/bin/sh
export SPARK_DIST_CLASSPATH=$(hadoop classpath)