I'm looking at the following DataFrame schema (names changed for privacy) in pyspark.
|-- some_data: struct (nullable = true)
| |-- some_array: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- some_nested_array: array (nullable = true)
| | | | |-- element: struct (containsNull = true)
| | | | | |-- some_param_1: long (nullable = true)
| | | | | |-- some_param_2: string (nullable = true)
| | | | | |-- some_param_3: string (nullable = true)
| | | |-- some_param_4: string (nullable = true)
| | | |-- some_param_5: string (nullable = true)
| |-- some_other_array: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- some_param_6: string (nullable = true)
| | | |-- some_param_7: string (nullable = true)
| |-- yet_another_array: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- some_param_8: string (nullable = true)
| | | |-- some_param_9: string (nullable = true)
I'm struggling using the explode function on the doubly nested array. I would like ideally to somehow gain access to the paramaters underneath some_array in their own columns so I can compare across some_param_1 through 9 - or even just some_param_1 through 5.