1
votes

I am new to Spark and just started using it. Trying to import SparkSession from pyspark but it throws an error: 'No module named 'pyspark'. Please see my code below.

# Import our SparkSession so we can use it
from pyspark.sql import SparkSession
# Create our SparkSession, this can take a couple minutes locally
spark = SparkSession.builder.appName("basics").getOrCreate()```

Error:
```---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-6ce0f5f13dc0> in <module>
      1 # Import our SparkSession so we can use it
----> 2 from pyspark.sql import SparkSession
      3 # Create our SparkSession, this can take a couple minutes locally
      4 spark = SparkSession.builder.appName("basics").getOrCreate()

ModuleNotFoundError: No module named 'pyspark'``` 

I am in my conda env and I tried ```pip install pyspark``` but I already have it.
1
If you have pyspark installed it should work, nothing out of ordinary in this code. Are you sure it was installed in the same environment that you run this code with? It's common to install packages in global python installation for machine and then have it missing from anaconda environment configured to run jupyter notebook - did it myself.Daniel
To Daniel's point, the, 'environment' here is Zepl, which has it's own import syntax, as mentioned below in my answer. Ironically, "It's common to install packages in global python installation for machine and then have it missing from anaconda environment configured to run jupyter notebook..." is exactly the case and how I solved a similar issueFrank Stallone III

1 Answers

1
votes

If you are using Zepl, they have their own specific way of importing. This makes sense, they need their own syntax since they are running in the cloud. It clarifies their specific syntax vs. Python itself. For instance %spark.pyspark.

%spark.pyspark
from pyspark.sql import SparkSession