I need to swap the value of multiple columns based in a condition of one of them and using the API instead sparkQL
I have a DataFrame with columns [A,B,C,X,Y,Z] and it has rows with this pattern:
(1,2,3,null,null,null),
(3,3,3,null,null,null),
(null,null,null,5,3,2),
(3,1,1,null,null,null),
(null,null,null,3,4,1)
I would like to achieve swapping the values between the groups of columns (a,b,c) and (x,y,z) when any row has the condition that the value in column A is null. So after the execution, the table should only have values in the columns [A,B,C]. I have tried to do something like
val result = df.withColumn("A", when(col("A").isNull, col("X")).withColumn("X", when(col("A").isNull,lit(null)).withColumn("B", when(col("A").isNull, col("Y")).withColumn("Y", when(col("A").isNull,lit(null)).withColumn("C", when(col("A").isNull, col("Z")).withColumn("Z", when(col("A").isNull,lit(null)).
Surprisingly executing only this:
val result = df.withColumn("A", when(col("A").isNull, col("X")).withColumn("X", when(col("A").isNull,lit(null))
I got the expected result for the the column swapping I defined with that line "(5,null,null,null,3,2)". But when I add the operations "withColumn" for the other 2 columns I ended up with rows like this "(5,null,null,null,3,2)"
Any help would be appreciated.