I have a PySpark dataframe (say df1) which has the following columns
1.> category : some string
2.> array1 : an array of elements
3.> array2 : an array of elements
Following is an example of df1
+--------+--------------+--------------+
|category| array1| array2|
+--------+--------------+--------------+
|A | [x1, x2, x3]| [y1, y2, y3]|
|B | [u1, u2]| [v1, v2]|
+--------+--------------+--------------+
For each row, the length of array1 is equal to the length of array2. In each column, I expect different rows to have different sizes of arrays for array1 (and array2).
I want to form separate columns (say element1 and element2) such that in each row, the columns element1 and element2 contain elements from same locations of array1 and array2 respectively.
Following is an example of the output dataframe (say df2) that I want:
+--------+--------------+--------------+----------+----------+
|category| array1| array2| element1| element2|
+--------+--------------+--------------+----------+----------+
|A | [x1, x2, x3]| [y1, y2, y3]| x1| y1|
|A | [x1, x2, x3]| [y1, y2, y3]| x2| y2|
|A | [x1, x2, x3]| [y1, y2, y3]| x3| y3|
|B | [u1, u2]| [v1, v2]| u1| v1|
|B | [u1, u2]| [v1, v2]| u2| v2|
+--------+--------------+--------------+----------+----------+
Below is what I have tried till now (but it gives me values in element1 and element2 from different positions in addition to what I want.)
df2 = df1.select( "*", F.explode("array1").alias("element1") ).select( "*", F.explode("array2").alias("element2") )