If you have RDD[Array[String]]
then you can get the desired data as
You can define a case class as
case class Data(Name: String, Id: Long)
Then parse each line to case class
val df = rdd.map( row => {
//split the line and convert to map so you can extract the data
val data = row.split(",").map(x => (x.split("=")(0),x.split("=")(1))).toMap
Data(data("name"), data("id").toLong)
})
convert to Dataframe and display
df.toDF().show(false)
Output:
+----+---+
|Name|Id |
+----+---+
|abc |1 |
|def |2 |
|ghi |3 |
+----+---+
Here is full code to read the file
case class Data(Name: String, Id: Long)
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().appName("xyz").master("local[*]").getOrCreate()
import spark.implicits._
val rdd = spark.sparkContext.textFile("path to file ")
val df = rdd.map(row => {
val data = row.split(",").map(x => (x.split("=")(0), x.split("=")(1))).toMap
Data(data("name"), data("id").toLong)
})
df.toDF().show(false)
}