Unable to install PySpark on Google Colab

Question

I am trying to install PySpark on Google Colab using the code given below but getting the following error.

tar: spark-2.3.2-bin-hadoop2.7.tgz: Cannot open: No such file or directory

tar: Error is not recoverable: exiting now

This code has ran successfully once. But it is throwing this error after the notebook restart. I have even tried running this from a different Google account but same error again.

(Also is there any way that we don't need to install PySpark everytime after the notebook re-start?)

code:

--------------------------------------------------------------------------------------------------------------------------------

!apt-get install openjdk-8-jdk-headless -qq > /dev/null

!wget -q http://apache.osuosl.org/spark/spark-2.3.2/spark-2.3.2-bin-hadoop2.7.tgz

This following line seems to cause the problem as it is not finding the downloaded file.

!tar xvf spark-2.3.2-bin-hadoop2.7.tgz

I have also tried the following two lines (instead of above two lines) suggested somewhere on medium blog. But nothing better.

!wget -q http://mirror.its.dal.ca/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz

!tar xvf spark-2.4.0-bin-hadoop2.7.tgz

!pip install -q findspark

-------------------------------------------------------------------------------------------------------------------------------

Any ideas how to get out of this error and install PySpark on Colab?

Harmeet Harmeet · Accepted Answer · 2019-04-06T20:49:31

I am running pyspark on colab by just using

!pip install pyspark

and it works fine.

Unable to install PySpark on Google Colab

tar: spark-2.3.2-bin-hadoop2.7.tgz: Cannot open: No such file or directory

tar: Error is not recoverable: exiting now

code:

5 Answers