I am trying to iterate over an array of array as a column in Spark dataframe. Looking for the best way to do this.
Schema:
root
|-- Animal: struct (nullable = true)
| |-- Species: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- mammal: array (nullable = true)
| | | | |-- element: struct (containsNull = true)
| | | | | |-- description: string (nullable = true)
Currently I am using this logic. This only gets the first array.
df.select(
col("Animal.Species").getItem(0).getItem("mammal").getItem("description")
)
Pseudo Logic:
col("Animal.Species").getItem(0).getItem("mammal").getItem("description")
+
col("Animal.Species").getItem(1).getItem("mammal").getItem("description")
+
col("Animal.Species").getItem(2).getItem("mammal").getItem("description")
+
col("Animal.Species").getItem(...).getItem("mammal").getItem("description")
Desired Example Output (flattened elements as string)
llama, sheep, rabbit, hare