I can't find an easy and elegant solution to this one.
I have a df1 with this column :
|-- guitars: array (nullable = true)
| |-- element: long (containsNull = true)
I have a df2 made of guitars, and an id matching with the Long in my df 1.
root
|-- guitarId: long (nullable = true)
|-- make: string (nullable = true)
|-- model: string (nullable = true)
|-- type: string (nullable = true)
I want to join my two dfs, obviously, and instead of having an array of long, I want an array of struct guitars from df2.
I'm using array_contains()
to join the two dfs, but spark is exploding the array of n Long in the df1 in n rows in the result df.
before
| 2|Eric Clapton| [1, 5]| [,,,]|
after
| 2|Eric Clapton| [1, 5]| [,,,]| 5|Fender|Stratocaster| Electric|
| 2|Eric Clapton| [1, 5]| [,,,]| 1|Gibson| SG| Electric|
What would be the most elegant solution to transform this array column of Long into an array column of struct from an other dataframe ?
ideal
| 2|Eric Clapton|[[Fender, Stratocaster, Electric],[Gibson, SG, Electric]]| [,,,]|
Thanks in advance
(first question btw, be humble :P)