I have a list of dictionaries looks like the following. Every dictionary is a list item.
my_list= [{"_id":1,"name":"xxx"},
{"_id":2,"name":"yyy"},
{"_id":3,"_name":"zzz"}]
I am trying to convert the list into a pyspark dataframe, with every dictionary being a row.
from pyspark.sql.types import StringType
df = spark.createDataFrame(my_list, StringType())
df.show()
My ideal result is the following:
+-----------------------------------------+
| dic|
+-----------------------------------------+
|{"_id":1,"name":"xxx"}|
|{"_id":2,"name":"yyy"}|
|{"_id":3,"_name":"zzz"}|
+-----------------------------------------+
But I got the error:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 25.0 failed 4 times, most recent failure: Lost task 0.3 in stage 25.0 (TID 95, 10.0.16.11, executor 0): org.apache.spark.api.python.PythonException: 'pyspark.serializers.SerializationError: Caused by Traceback (most recent call last):
What's wrong with my code?