I have the following RDD (name: AllTrainingDATA_RDD) which is of type org.apache.spark.rdd.RDD[(String, Double, Double, String)] :
(ICCH_1,4.3,3.0,Iris-setosa)
(ICCH_1,4.4,2.9,Iris-setosa)
(ICCH_1,4.4,3.0,Iris-setosa)
(ICCH_2,4.4,3.2,Iris-setosa)
1st column : ICCH_ID, 2nd column: X_Coordinates, 3rd Column: Y_Coordinates, 4th column: Class
I would like to end up with an RDD which has 2nd and 3rd column as the Key and 4th column as Value. The column ICCH_ID should remain in the RDD.
My currently attempt based on some Internet research is this:
val AllTrainingDATA_RDD_Final = AllTrainingDATA_RDD.map(_.split(",")).keyBy(_(X_COORD,Y_COORD)).mapValues(fields => ("CLASS")).groupByKey().collect()
However I get this error:
error: value split is not a member of (String, Double, Double, String)
P.S. I am using Databricks Community Edition. I am new to Scala.
stdlibas well as theSparkone. Finally Google is your friend. - PS: The error is pretty clear to me, your RDD is of tuples, tuples do not have a split method, and your field access are bad too. - Luis Miguel Mejía Suárez