Consider a DataFrame df with 4 columns c0, c1, c2 and c3 where c0 and c1 are nested columns(struct type) and the other two are string type:
root
|-- c0: struct (nullable = true)
| |-- x: string (nullable = true)
| |-- y: string (nullable = true)
|-- c1: struct (nullable = true)
| |-- x: string (nullable = true)
| |-- y: string (nullable = true)
|-- c2: string (nullable = true)
|-- c3: string (nullable = true)
I want to select all the values of c0 or c1 based on the value of c3.
Example: If the value of c3 is "d", I want to select c0.* else c1.*
This is what I've tried so far but without luck:
Approach: Using when and otherwise inside the select clause.
.select(
col("c3"),
col("c4"),
when(col("c3") === "d", col("c0.*").otherwise(col("c1.*"))))
This gives the following exception:
org.apache.spark.sql.AnalysisException: Invalid usage of '*' in expression 'casewhen';
Then instead of using col I've tried using df:
.select(
col("c3"),
col("c4"),
when(col("c3") =!= "d", df("c0").otherwise(df("c1"))))
This gives the following exception:
otherwise() can only be applied on a Column previously generated by when()
Any help on this would be appreciated!
PS: I'm a beginner in Spark :)