I am newbie on Stack overflow and to Spark.Basically doing RDD transformation.
My input data :
278222631,2763985,10.02.12,01.01.53,Whatsup,NA,Email,Halter,wagen,28.06.12,313657794,VW,er,i,B,0,23.11.11,234
298106482,2780663,22.02.12,22.02.12,Whatsup,NA,WWW,Halter,wagen,26.06.12,284788860,VW,er,i,B,0,02.06.04,123
My RDD format
val dateCov: RDD[(Long, Long, String, String, String, String, String, String, String, String, Long, String, String, String, String, String, String, Long)]
doing some reduceBykey
transformations map([(k,k),(v)] on col (1,17) as key and col(18) as Value. And applying some functions on reduceByKey
example:
val reducedSortedRDD = dateCov.map(r => { ((r._1, r._11) -> (r._18)) })
.reduceByKey((x, y) => ((math.min(x, y)))) // find minimum diff
.map(r => (r._1._1, r._1._2, r._2))
.sortBy(_._1, true)
- My question - Is it possible after
reduceByKey
function to get all the others columns i.e my reducedSortedRDD return type should bereducedSortedRDD :
RDD[(Long, Long, String, String, String, String, String, String, String, String, Long, String, String, String, String, String, String, Long)]
and not reducedSortedRDD: RDD[(Long, Long, Long)]
as in this case.
- I am doing it right? I just want basically to have a whole initial RDD instead of a subset of RDD after
reduceByKey
transformation
I am using spark 1.4