I am trying to install PySpark on Google Colab using the code given below but getting the following error.
tar: spark-2.3.2-bin-hadoop2.7.tgz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
This code has ran successfully once. But it is throwing this error after the notebook restart. I have even tried running this from a different Google account but same error again.
(Also is there any way that we don't need to install PySpark everytime after the notebook re-start?)
code:
--------------------------------------------------------------------------------------------------------------------------------!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q http://apache.osuosl.org/spark/spark-2.3.2/spark-2.3.2-bin-hadoop2.7.tgz
This following line seems to cause the problem as it is not finding the downloaded file.
!tar xvf spark-2.3.2-bin-hadoop2.7.tgz
I have also tried the following two lines (instead of above two lines) suggested somewhere on medium blog. But nothing better.
!wget -q http://mirror.its.dal.ca/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz
!tar xvf spark-2.4.0-bin-hadoop2.7.tgz
!pip install -q findspark
-------------------------------------------------------------------------------------------------------------------------------Any ideas how to get out of this error and install PySpark on Colab?