0
votes

I have a Dataframe named df with following structure:

root
|-- country: string (nullable = true)
|-- competition: string (nullable = true)
|-- competitor: array (nullable = true)
    |-- element: struct (containsNull = true)
    |   |-- name: string (nullable = true)
        |-- time: string (nullable = true)

Which looks like this:

|country|competiton|competitor      |
|___________________________________| 
|USA    |WN        |[{Adam, 9.43}]  |
|China  |FN        |[{John, 9.56}]  |
|China  |FN        |[{Adam, 9.48}]  |
|USA    |MNU       |[{Phil, 10.02}] |
|...    |...       |...             |

I want to pivot (or something similar) the competitor Column into new columns depending on the values in each struct so it looks like this:

|country|competition|Adam|John|Phil  |
|____________________________________| 
|USA    |WN         |9.43|... |...   |
|China  |FN         |9.48|9.56|...   |
|USA    |MNU        |... |... |10.02 |              

The names are unique so if column already exists i dont want to create a new but fill the value in the already created one. There are alot of names so it needs to be done dynamically.

I have a pretty big dataset so i cant use Pandas.