One of my Dataframe(spark.sql) has this schema.
root
|-- ValueA: string (nullable = true)
|-- ValueB: struct (nullable = true)
| |-- abc: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- a0: string (nullable = true)
| | | |-- a1: string (nullable = true)
| | | |-- a2: string (nullable = true)
| | | |-- a3: string (nullable = true)
|-- ValueC: struct (nullable = true)
| |-- pqr: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- info1: string (nullable = true)
| | | |-- info2: struct (nullable = true)
| | | | |-- x1: long (nullable = true)
| | | | |-- x2: long (nullable = true)
| | | | |-- x3: string (nullable = true)
| | | |-- info3: string (nullable = true)
| | | |-- info4: string (nullable = true)
|-- Value4: struct (nullable = true)
| |-- xyz: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- b0: string (nullable = true)
| | | |-- b2: string (nullable = true)
| | | |-- b3: string (nullable = true)
|-- Value5: string (nullable = true)
I need to save this to CSV file but without using any flatten, explode in the below format.
|-- ValueA: string (nullable = true)
|-- ValueB: struct (nullable = true)
|-- ValueC: struct (nullable = true)
|-- ValueD: struct (nullable = true)
|-- ValueE: string (nullable = true)
I have Directly used the command [df.to_pandas().to_csv("output.csv")]
this serves my purpose, but I need a better approach. I am using pyspark