0
votes

I am new to pyspark. I installed Pyspark on my windows machine

I downloaded apache spark from Spark download url

I set HADOOP_HOME and SPARK_HOME in environment variables

path variable

my SPARK_HOME=C:\spark\spark-2.4.4-bin-hadoop2.7

my HADOOP_HOME=C:\spark\spark-2.4.4-bin-hadoop2.7

But when I enter pyspark on command prompt I am getting

The system cannot find the path specified.

Even if I am going to bin directory and executing pyspark it is throwing same exception

Not sure what I missed here.please help me here

5

5 Answers

2
votes

Set the path as given below:

Java

JAVA_HOME = C:\Program Files\Java\jdk1.8.0_73

PATH = C:\Program Files\Java\jdk1.8.0_73\bin

Hadoop

Create a folder Hadoop/bin and place the winutils.exe file inside the bin folder.

HADOOP_HOME = C:\Hadoop

PATH = C:\Hadoop\bin

Spark

Download whichever spark version(eg: spark-2.4.4-bin-hadoop2.7)

SPARK_HOME = C:\software\spark-2.3.1-bin-hadoop2.7

PATH = C:\software\spark-2.3.1-bin-hadoop2.7\bin

0
votes

The easiest way to install spark is by using python findspark

pip install findspark

import findspark

finspark.init('\path\to\extracted\binaries\folder')

import pyspark
0
votes

I had same problem, did multiple research and finally i found that i am having jdk with jdk1.8.0_261 and JRE jre1.8.0_271

As solution, i uninstalled both jdk and jre and then installed jdk1.8.0_261, which basically installed both with same version jdk1.8.0_261 jre1.8.0_261

which resolved the issue.

0
votes

Try to locate the pyspark path and then export that path, then install findSpark package , it will do the rest of the work , for example let's say that my pyspark path is : "/usr/spark-2.4.4/python/pyspark/" so what I have to do is:

!export SPARK_HOME="/usr/spark-2.4.4/python/pyspark/"
!pip install findspark

import findspark
findspark.init()
from pyspark.sql import SparkSession 

-1
votes

Try with adding this code segment.

import os
import sys
os.environ['HADOOP_HOME'] = "Your_Hadoop_Home_Path"
# os.environ['HADOOP_HOME'] = "~file_path~\Hadoop\hadoop-3.x.x"

#what actually done here is changing the HADOOP_HOME environment path