I have the below spark dataframe/dataset.
column_1 column_2 column_3 column_4
A,B NameA,NameB F NameF
C NameC NULL NULL
NULL NULL D,E NameD,NULL
G NULL H NameH
I NameI J NULL
All the above 4 columns are comma delimited. I have to convert this into a new dataframe/dataset which has only 2 columns and without any comma delimiters. The value in column_1 and its corresponding name in Column_2 should be written to output. Similarly for column_3 and column_4. If both are column_1 and column_2 are null, they are not required in output.
Expected output:
out_column_1 out_column_2
A NameA
B NameB
F NameF
C NameC
D NameD
E NULL
G NULL
H NameH
I NameI
J NULL
Is there a way to achieve this in Java spark without using UDF's?