Replace seperator in Array[long] in the Spark dataframe

Question

I'm reading a JSON file into a spark data frame in Scala. I have a JSON field like

"areaGlobalIdList":[2389,3,2,1,2147,2142,2518]

Spark is automatically inferring the datatype of this field as Array[long]. I tried concat_ws, but it seems only works with array[string]. When I tried converting this to array[string], the output is showing as

scala> val cmrdd = sc.textFile("/user/nkthn/cm.json")
scala> val cmdf = sqlContext.read.json(cmrdd)
scala> val dfResults = cmdf.select($"areaGlobalIdList".cast(StringType)).withColumn("AREAGLOBALIDLIST", regexp_replace($"areaGlobalIdList" , ",", "." ))
scala> dfResults.show(20,false)

+------------------------------------------------------------------+
|AREAGLOBALIDLIST                                                  |
+------------------------------------------------------------------+
|org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@6364b584|
+------------------------------------------------------------------+

I'm expecting the output to be

[2389.3.2.1.2147.2142.2518]

Any assistance is greatly helpful.

Ramesh Maharjan Ramesh Maharjan · Accepted Answer · 2017-12-11T03:38:22

Given the schema of the areaGlobalIdList column as

 |-- areaGlobalIdList: array (nullable = true)
 |    |-- element: long (containsNull = false)

You can achieve this with simple udf function as

import org.apache.spark.sql.functions._
val concatWithDot = udf((array: collection.mutable.WrappedArray[Long]) => array.mkString("."))

df.withColumn("areaGlobalIdList", concatWithDot($"areaGlobalIdList")).show(false)

Replace seperator in Array[long] in the Spark dataframe

1 Answers