I have an ordered RDD of type ((id, ts), some value). This was partitioned using a custom partitioner on the id field only.
math.abs(id.hashCode % numPartitions)
Now If I run the following two functions on this partitioned RDD, will it involve shuffling and re-partitioning of the dataset.
val partitionedRDD: ((id:Long, ts:Long), val:String) = <Some Function>
val flatRDD = orderedRDD.map(_ => (_._1.id, (_._1.ts, _._2)))
What I want to know is, whether flatRDD.groupByKey() and flatRDD.reduceByKey() will have the same partitioning as the partitionedRDD or Spark will shuffle the dataset again and create new partitions?
Thanks, Devi
