I have a csv file of the format:
key, age, marks, feature_n
abc, 23, 84, 85.3
xyz, 25, 67, 70.2
Here the number of features can vary. In eg: I have 3 features (age, marks and feature_n). I have to convert it into a Map[String,String] as below :
[key,value]
["abc","age:23,marks:84,feature_n:85.3"]
["xyz","age:25,marks:67,feature_n:70.2"]
I have to join the above data with another dataset A on column 'key' and append the 'value' to another column in dataset A. The csv file can be loaded into a dataframe with schema (schema defined by first row of the csv file).
val newRecords = sparkSession.read.option("header", "true").option("mode", "DROPMALFORMED").csv("/records.csv");
Post this I will join the dataframe newRecords with dataset A and append the 'value' to one of the columns of dataset A.
How can I iterate over each column for each row, excluding the column "key" and generate the string of format "age:23,marks:84,feature_n:85.3" from newRecords?
I can alter the format of csv file and have the data in JSON format if it helps.
I am fairly new to Scala and Spark.