0
votes

I need to swap the value of multiple columns based in a condition of one of them and using the API instead sparkQL

I have a DataFrame with columns [A,B,C,X,Y,Z] and it has rows with this pattern:

(1,2,3,null,null,null),
(3,3,3,null,null,null),
(null,null,null,5,3,2),
(3,1,1,null,null,null),
(null,null,null,3,4,1)

I would like to achieve swapping the values between the groups of columns (a,b,c) and (x,y,z) when any row has the condition that the value in column A is null. So after the execution, the table should only have values in the columns [A,B,C]. I have tried to do something like

val result = df.withColumn("A", when(col("A").isNull, col("X")).withColumn("X", when(col("A").isNull,lit(null)).withColumn("B", when(col("A").isNull, col("Y")).withColumn("Y", when(col("A").isNull,lit(null)).withColumn("C", when(col("A").isNull, col("Z")).withColumn("Z", when(col("A").isNull,lit(null)).

Surprisingly executing only this:

val result = df.withColumn("A", when(col("A").isNull, col("X")).withColumn("X", when(col("A").isNull,lit(null))

I got the expected result for the the column swapping I defined with that line "(5,null,null,null,3,2)". But when I add the operations "withColumn" for the other 2 columns I ended up with rows like this "(5,null,null,null,3,2)"

Any help would be appreciated.

1

1 Answers

0
votes

After you swap "A" and "X", "A" is not null any more. Therefore, the subsequent operations don't work.

I strongly suggest that you use a flag column to help achieve this.

val result = df
.withColumn("flag", when(col("A").isNull, true).otherwise(false)
.withColumn("A", when(col("flag") === true, col("X"))
.withColumn("X", when(col("flag") === true, lit(null))
.withColumn("B", when(col("flag") === true, col("Y"))
.withColumn("Y", when(col("flag") === true, lit(null))
.withColumn("C", when(col("flag") === true, col("Z"))
.withColumn("Z", when(col("flag") === true, lit(null)).