I am aware that for a PairRDD (key, value), we can give our own partitioning scheme or partition it by using the default partitioners (hash and range). But, is there a way to partition a normal RDD by using our own partition class ?
Thanks!
Need to inherit abstract class org.apache.spark.Partitioner
and provide an implementation for two methods:
class WeekDayPartitioner extends Partitioner {
override def numPartitions: Int = 7
override def getPartition(key: Any): Int = key.asInstanceOf[LocalDate].getDayOfWeek
}
val partitioner = new WeekDayPartitioner()
myRdd.partitionBy(partitioner) //RDD[(LocalDate, String)]