I am trying to create and analyze dataframe in PySpark and in Notebook.
Below are my codes in Jupyter Notebook.
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.master("local") \
.appName("Neural Network Model") \
.config("spark.executor.memory", "6gb") \
.getOrCreate()
I was able to start Spark Session.
df1 = spark.createDataFrame([('John', 56, 80)])
print(df1.dtypes)
print(df1)
print(df1.show())
I was able to create df1, dataframe, but Somehow, I got error message when I am trying to use data frame function in df1.show()
Py4JJavaError Traceback (most recent call last) in 2 print(df1.dtypes) 3 print(df1) ----> 4 print(df1.show())
Py4JJavaError: An error occurred while calling o501.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 9.0 failed 1 times, most recent failure: Lost task 0.0 in stage 9.0 (TID 22, localhost, executor driver): org.apache.spark.SparkException: Python worker failed to connect back. at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:170) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:97) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117) at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:108) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
Could you help me to fix this issues? I am not sure if it is system issue or my codes.
Thanks!!!