
I have pyspark dataframe with a column named Filters: "array>"

I want to save my dataframe in csv file, for that i need to cast the array to string type.

I tried to cast it: DF.Filters.tostring() and DF.Filters.cast(StringType()), but both solutions generate error message for each row in the columns Filters:


The code is as follows

from pyspark.sql.types import StringType


|-- ClientNum: string (nullable = true)
|-- Filters: array (nullable = true)
    |-- element: struct (containsNull = true)
          |-- Op: string (nullable = true)
          |-- Type: string (nullable = true)
          |-- Val: string (nullable = true)

DF_cast = DF.select ('ClientNum',DF.Filters.cast(StringType())) 


|-- ClientNum: string (nullable = true)
|-- Filters: string (nullable = true)


| ClientNum | Filters 
|  32103    | org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d9e517ce
|  218056   | org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3c744494

Sample JSON data:


Thanks !!

Can you share the minimal code.Abhishek Bansal
Can you print schema and show data before the transformation. Also print schema after the transformation.Abhishek Bansal
The schema seems to be correct.Omar14
M not able to recreate the issue. Can you show data before the transformation.Abhishek Bansal

3 Answers


I created a sample JSON dataset to match that schema:



|ClientNum|Filters                                                           |
|abc123   |org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@60fca57e|

Your problem is best solved using the explode() function which flattens an array, then the star expand notation:

s.selectExpr("explode(Filters) AS structCol").selectExpr("structCol.*").show()
| Op|Type|Val|
|foo| bar|baz|

To make it a single column string separated by commas:

s.selectExpr("explode(Filters) AS structCol").select(F.expr("concat_ws(',', structCol.*)").alias("single_col")).show()
| single_col|

Explode Array reference: Flattening Rows in Spark

Star expand reference for "struct" type: How to flatten a struct in a spark dataframe?


For me in Pyspark the function to_json() did the job.

As a plus compared to the simple casting to String, it keeps the "struct keys" as well (not only the "struct values"). So for the reported example I would have something like:


This was much more useful to me since that I had to write results to a Postgres table. In this format I can easily use supported JSON functions in Postgres


You can try this:

DF = DF.withColumn('Filters', DF.Filters.cast("string"))