how to process output of groupByKey RDD which is key and list of Values RDD[K,List[v]]

Question

I'm new to Spark and below issue bothering me for a while.

My input file is comma separated file and have created RDD which has Store as key and list of promotions as value. A key(my case product) can have more than one value. I have grouped the RDD using GroupByKey. It solve my problem to bring all the promotions which comes under under same key. Upto this everything fine. Now I want to iterate through the list of value for each key to find out whether my key (store) is having particular promotion or not . If my key find that promotion then write the record with store(key) and promotion(value)

val firstRDD = sc.textFile(".....")
val secondRDD = firstRDD.map(line=>line.split(",")(0),line.split(",")(1))
val thirdRDD = secondRDD.groupByKey()

(1,(aaa,bbb,ccc,ddd))
(2,(aaa,ccc))
(3,(ddd,aaa))

based on above list I wanted to know for key 1 value aaa is exist or not if not aaa whether bbb is exist or not.. How to do this in Spark Scala.

Think about it as a basic Scala operation before using map. If you have a tuple how would you process it to get what you want ? — eliasah

Sidd Singal Sidd Singal · Accepted Answer · 2016-04-11T18:20:59

Because your RDD is of type Tuple2, you can use the PairRDD functionality. This means you have access to the "lookup" function of your RDDs. In order to see if 1 has a corresponding value of aaa, the easiest way would probably be

secondRDD.lookup(1).contains("aaa")

Note that it is easier to use secondRDD as opposed to thirdRDD for this.

how to process output of groupByKey RDD which is key and list of Values RDD[K,List[v]]

1 Answers