2
votes

I have an RDD of the follwing format:

scala> user_freq_movie
res0: org.apache.spark.rdd.RDD[(Int, List[Int])] = ShuffledRDD[23]

The format of one of the element is as follows:

//(userID,List(freqMovies):
scala> user_freq_movie.first
res1: (Int, List[Int]) = (1,List(102, 101, 98, 100))`

I wish to generate a new RDD transformation of the above so as to have key Value pairs as user ID and the pair Eg:

1,(102,101)
1,(102,98)
1,(102,98)
1,(101,98)

I can currently generate all the pairs using the combine function but I'm missing out on the users from which they came. How to solve this challenge in spark? I'm using the following transformations for generating all pairs from the RDD:

val allpairs= user_freq_movie.flatMap(line=>line._2.combinations(2).toSeq)
  .map(_.sorted).map(line=>line.toTuple2)
1

1 Answers

1
votes

Use flatMapValues:

user_freq_movie.flatMapValues(line=>line._2.combinations(2))