How to create udf containing Array (case class) for complex column in a dataframe

Question

I have a dataframe which have a complex column datatype of Arraytype>. For transforming this dataframe I have created udf which can consume this column using Array [case class] as parameter. The main bottle neck here is when I create case class according to stucttype, the structfield name contains special characters for example "##field". So I provide same name to case class like this way case class (##field) and attach this to udf parameter. After interpreted in spark udf definition change name of case class field to this "$hash$hashfield". When performing transform using this dataframe it is failing because of this miss match. Please help ...

Artur Rashitov Artur Rashitov · Accepted Answer · 2017-04-04T13:15:50

Due JVM limitations Scala stores identifiers in encoded form and currently Spark can't map ##field to $hash$hashfield.

One possible solution is to extract fields manually from raw row (but you need to know order of the fields in df, you can use df.schema for that):

val myUdf = udf { (struct: Row) =>
  // Pattern match struct:
  struct match {
    case Row(a: String) => Foo(a)
  }

  // .. or extract values from Row
  val `##a` = struct.getAs[String](0)
}

How to create udf containing Array (case class) for complex column in a dataframe

1 Answers