I'm using spark for processing large files, I have 12 partitions.
I have rdd1 and rdd2 i make a join between them, than select (rdd3).
My problem is, i consulted that the last partition is too big than other partitions, from partition 1 to partitions 11 45000 recodrs
but the partition 12 9100000 recodrs
.
so i divided 9100000 / 45000 =~ 203
. i repartition my rdd3 into 214(203+11)
but i last partition still too big.
How i can balance the size of my partitions ?
My i write my own custom partitioner?
repartition
andpartitionBy
– Zied Hermi