I am trying to query the spark sql dataframe with complex type, where the function should itself be able to create and expression to generate the column dataframe for nested complex datatypes. say
case class SubRecord(x: Int)
case class ArrayElement(foo: String, bar: Int, vals: Array[Double])
case class Record(
an_array: Array[Int], a_map: Map[String, String],
a_struct: SubRecord, an_array_of_structs: Array[ArrayElement])
val df = sc.parallelize(Seq(
Record(Array(1, 2, 3), Map("foo" -> "bar"), SubRecord(1),
Array(
ArrayElement("foo", 1, Array(1.0, 2.0)),
ArrayElement("bar", 2, Array(3.0, 4.0)))),
Record(Array(4, 5, 6), Map("foz" -> "baz"), SubRecord(2),
Array(ArrayElement("foz", 3, Array(5.0, 6.0)),
ArrayElement("baz", 4, Array(7.0, 8.0))))
)).toDF
referred from Querying Spark SQL DataFrame with complex types
for extracting the map type query could be
df.select($"a_map.foo").show
now if I have
case class Record(
an_array: Array[Int], a_map_new: Map[String, Array[ArrayElement]],
a_struct: SubRecord, an_array_of_structs: Array[ArrayElement])
instead of Map[String,String] , how to create a udf that takes the name or index in case of array and generates the result for that nested element in complex datatype.
say suppose now i want to query on the vals[0] contained in a_map_new
.
Row
from udf, so you'd need mapping to external object. Also, should it work for a complete map, or a single key? – Alper t. Turkerudf
. Personally I'd choose strongly typed dataset for op like this one. – Alper t. Turkervals[0] contained in a_map_new
with appropriate example? – Ramesh Maharjan