I am facing the below error when I am developing data pipeline using python and pyspark.
PS C:\Users\folder\Documents\folder\projects\code\etl-gd\src\jobs\greater-data> python test.py Traceback (most recent call last): File "test.py", line 1, in from pyspark.conf import SparkConf File "C:\Users\folder\AppData\Local\Programs\Python\Python37\lib\site-packages\pyspark__init__.py", line 51, in from pyspark.context import SparkContext File "C:\Users\folder\AppData\Local\Programs\Python\Python37\lib\site-packages\pyspark\context.py", line 43, in from pyspark.profiler import ProfilerCollector, BasicProfiler File "C:\Users\folder\AppData\Local\Programs\Python\Python37\lib\site-packages\pyspark\profiler.py", line 18, in import cProfile File "C:\Users\folder\AppData\Local\Programs\Python\Python37\lib\cProfile.py", line 10, in import profile as _pyprofile File "C:\Users\folder\Documents\folder\projects\code\etl-gd\src\jobs\greater-data\profile.py", line 2, in from awsglue.context import GlueContext File "C:\Users\folder\Documents\folder\projects\code\etl-gd\src\jobs\greater-data\awsglue__init__.py", line 13, in from .dynamicframe import DynamicFrame File "C:\Users\folder\Documents\folder\projects\code\etl-gd\src\jobs\greater-data\awsglue\dynamicframe.py", line 20, in from pyspark.sql.dataframe import DataFrame File "C:\Users\folder\AppData\Local\Programs\Python\Python37\lib\site-packages\pyspark\sql__init__.py", line 45, in from pyspark.sql.types import Row File "C:\Users\folder\AppData\Local\Programs\Python\Python37\lib\site-packages\pyspark\sql\types.py", line 36, in from pyspark import SparkContext ImportError: cannot import name 'SparkContext' from 'pyspark' (C:\Users\folder\AppData\Local\Programs\Python\Python37\lib\site-packages\pyspark__init__.py)
The code is a really simple one only to try it:
from pyspark.conf import SparkConf
print("hello world")
Java, spark, python and pyspark are properly installed as below:
> PS
> C:\Users\folder\Documents\folder\projects\code\etl-gd\src\jobs\greater-data>
> java -version java version "1.8.0_241" Java(TM) SE Runtime Environment
> (build 1.8.0_241-b07) Java HotSpot(TM) 64-Bit Server VM (build
> 25.241-b07, mixed mode) PS C:\Users\folder\Documents\folder\projects\code\etl-gd\src\jobs\greater-data>
> PS
> C:\Users\folder\Documents\folder\projects\code\etl-gd\src\jobs\greater-data> python --version
> Python 3.7.6
> PS
> C:\Users\folder\Documents\folder\projects\code\etl-gd\src\jobs\greater-data>
> spark-shell --version Welcome to
> ____ __
> / __/__ ___ _____/ /__
> _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.3
> /_/
>
> Using Scala version 2.11.12, Java HotSpot(TM) 64-Bit Server VM,
> 1.8.0_231 Branch heads/v2.4.3 Compiled by user vaviliv on 2019-09-17T17:31:05Z Revision c3e32bf06c35ba2580d46150923abfa795b4446a
> Url https://github.com/apache/spark Type --help for more information.
> PS
> C:\Users\folder\Documents\folder\projects\code\etl-gd\src\jobs\greater-data>
> pyspark --version
> Welcome to
> ____ __
> / __/__ ___ _____/ /__
> _\ \/ _ \/ _ `/ __/ '_/
> /___/ .__/\_,_/_/ /_/\_\ version 2.4.3
> /_/
>
> Using Scala version 2.11.12, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_231
> Branch heads/v2.4.3
> Compiled by user vaviliv on 2019-09-17T17:31:05Z
> Revision c3e32bf06c35ba2580d46150923abfa795b4446a
> Url https://github.com/apache/spark
> Type --help for more information.
Thank you in advance for your help.