I'm quite newbie in Spark and I have 2 Question:
- I have a large set of points and I made an RDD (called
partitionedData
) from them and partitioned it based on a custom partitioner so that each partition has at most a threshold number of points. Because I need to choose some Points as a leader in each partition and be sure that the corresponding leaders and points in each partition are in a same node, ImapPartitions
thepartitionedData
and set thepreservesPartitioning
flag astrue
. Finally, the result of this RDD is my desired leader RDD. Here is my first question: I know that the leader RDD preserves it's parent RDD partitioning (co-partitioned), but I'm not sure if the the leaders in each partition will be placed in a same node as their parents Points (co-located)? - If the answer of the above question is NO, so how can I co-locate the partitions of a given RDD with another pre-partitioned RDD?