I need executors to finish processing data at different times.
I think the easiest way is to make RDD partitions have not uniform sizes. How can I do this?
Not sure what you are trying to achieve, but you can partition the RDD anyway you like using partitionBy
eg:
sc.parallelize(xrange(10)).zipWithIndex()
.partitionBy(2, lambda x: 0 if x<2 else 1)
.glom().collect()
[[(0, 0), (1, 1)], [(2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8), (9, 9)]]
Note that it works on a (k,v) RDD and the partitioning function takes only k as a param