Spark/scala - can we create new columns from an existing column value in a dataframe

Question

I am trying to see if we can create new columns from value in one of the columns in a dataFrame using spark/scala. I have a dataframe with following data in it

df.show()

+---+-----------------------+
|id |allvals                |
+---+-----------------------+
|1  |col1,val11|col3,val31  |
|3  |col3,val33|col1,val13  |
|2  |col2,val22             |
+---+-----------------------+

In the above data col1/col2/col3 are the column names followed by it's value. Column name and value are separated by ,. Each set is separated by |.

Now, I want to achieve like this

+---+----------------------+------+------+------+
|id |allvals               |col1  |col2  |col3  |
+---+----------------------+------+------+------+
|1  |col1,val11|col3,val31 |val11 |null  |val31 |
|3  |col3,val33|col1,val13 |val13 |null  |val13 |
|2  |col2,val22            |null  |val22 |null  |
+---+----------------------+------+------+------+

Appreciate any help.

Leo C Leo C · Accepted Answer · 2018-05-04T21:33:30

You can transform the DataFrame using split, explode and groupBy/pivot/agg, as follows:

val df = Seq(
  (1, "col1,val11|col3,val31"),
  (2, "col3,val33|col1,val13"),
  (3, "col2,val22")
).toDF("id", "allvals")

import org.apache.spark.sql.functions._

df.withColumn("temp", split($"allvals", "\\|")).
  withColumn("temp", explode($"temp")).
  withColumn("temp", split($"temp", ",")).
  select($"id", $"allvals", $"temp".getItem(0).as("k"), $"temp".getItem(1).as("v")).
  groupBy($"id", $"allvals").pivot("k").agg(first($"v"))

// +---+---------------------+-----+-----+-----+
// |id |allvals              |col1 |col2 |col3 |
// +---+---------------------+-----+-----+-----+
// |1  |col1,val11|col3,val31|val11|null |val31|
// |3  |col2,val22           |null |val22|null |
// |2  |col3,val33|col1,val13|val13|null |val33|
// +---+---------------------+-----+-----+-----+

Spark/scala - can we create new columns from an existing column value in a dataframe

2 Answers