I am reading a hive table as a dataframe and retrieving it in a new dataset. I am reading specific values(string)from a struct type and I want to format the values before I store them in the case class.
For eg: I read the struct type as "listelements.sneaker.colors", this returns an array as there are several colors. Before storing them in the new dataset, I want the colors formatted like this:
"red","blue","yellow" (quoted and comma separated)
and stored as a single string.
concat_ws concats the array elements with a comma, but I also need to enclose them in double-quotes.
session.read
.table(footWear)
.select(
$"id",
$"footWearCategory".as("category"),
concat_ws(",", $"listelements".getField("sneaker").getField("colors")).as("availableColors"))
.where($"date" === runDate)
.as[FootWearInformation]
case class FootWearInformation(id: String, category: String, availableColors: String)