How to split column in Spark Dataframe to multiple columns

Question

In my case how to split a column contain StringType with a format '1-1235.0 2-1248.0 3-7895.2' to another column with ArrayType contains ['1','2','3']

Raphael Roth Raphael Roth · Accepted Answer · 2019-08-18T19:21:05

this is relatively simple with UDF:

val df = Seq("1-1235.0 2-1248.0 3-7895.2").toDF("input")

val extractFirst = udf((s: String) => s.split(" ").map(_.split('-')(0).toInt))

df.withColumn("newCol", extractFirst($"input"))
  .show()

gives

+--------------------+---------+
|               input|   newCol|
+--------------------+---------+
|1-1235.0 2-1248.0...|[1, 2, 3]|
+--------------------+---------+

I could not find an easy soluton with spark internals (other than using split in combination with explode etc and then re-aggregating)

How to split column in Spark Dataframe to multiple columns

2 Answers