0
votes

I am reading a hive table as a dataframe and retrieving it in a new dataset. I am reading specific values(string)from a struct type and I want to format the values before I store them in the case class.

For eg: I read the struct type as "listelements.sneaker.colors", this returns an array as there are several colors. Before storing them in the new dataset, I want the colors formatted like this:

"red","blue","yellow" (quoted and comma separated)

and stored as a single string.

concat_ws concats the array elements with a comma, but I also need to enclose them in double-quotes.

session.read
      .table(footWear)
      .select(
        $"id",
        $"footWearCategory".as("category"),
        concat_ws(",", $"listelements".getField("sneaker").getField("colors")).as("availableColors"))
.where($"date" === runDate)
      .as[FootWearInformation]


case class FootWearInformation(id: String, category: String, availableColors: String)
1
Write an UDF, that takes in an array and gives out a string in the required format. If you need help with the UDF, then please post a sample dataset. - partha_devArch
Thanks for the suggestion. Writing a UDF solved the problem - BusyBee

1 Answers

0
votes

UDF:

 def formatArray = udf((arr: collection.mutable.WrappedArray[String]) =>
  arr.map(x => s""""$x\"""").mkString(","))