0
votes

I am aware that for a PairRDD (key, value), we can give our own partitioning scheme or partition it by using the default partitioners (hash and range). But, is there a way to partition a normal RDD by using our own partition class ?

Thanks!

1

1 Answers

0
votes

Need to inherit abstract class org.apache.spark.Partitioner and provide an implementation for two methods:

class WeekDayPartitioner extends Partitioner {
  override def numPartitions: Int = 7
  override def getPartition(key: Any): Int = key.asInstanceOf[LocalDate].getDayOfWeek
}

val partitioner = new WeekDayPartitioner()
myRdd.partitionBy(partitioner) //RDD[(LocalDate, String)]