I have data with below schema. I want all the columns should be in sorted alphabetically. I want it in pyspark data frame.
root
|-- _id: string (nullable = true)
|-- first_name: string (nullable = true)
|-- last_name: string (nullable = true)
|-- address: struct (nullable = true)
| |-- pin: integer (nullable = true)
| |-- city: string (nullable = true)
| |-- street: string (nullable = true)
The below code sorts only the outer columns but not the nested columns.
>>> cols = df.columns
>>> df2=df[sorted(cols)]
>>> df2.printSchema()
The schema after this code looks like
root
|-- _id: string (nullable = true)
|-- address: struct (nullable = true)
| |-- pin: integer (nullable = true)
| |-- city: string (nullable = true)
| |-- street: string (nullable = true)
|-- first_name: string (nullable = true)
|-- last_name: string (nullable = true)
(since there's underscore at id, it appears first)
The schema which I want is as below. (Even the columns inside the address should be sorted)
root
|-- _id: string (nullable = true)
|-- address: struct (nullable = true)
| |-- city: string (nullable = true)
| |-- pin: integer (nullable = true)
| |-- street: string (nullable = true)
|-- first_name: string (nullable = true)
|-- last_name: string (nullable = true)
Thanks in advance.