I'm running Spark 2.0 in stand-alone mode, and I'm the only one submitting jobs in my cluster.
Suppose I have an RDD with 100 partitions and only 10 partitions in total would fit in memory at a time.
Let's also assume that allotted execution memory is enough and will not interfere with storage memory.
Suppose I iterate over the data in that RDD.
rdd.persist() // MEMORY_ONLY
for (_ <- 0 until 10) {
rdd.map(...).reduce(...)
}
rdd.unpersist()
For each iteration, will the first 10 partitions that are persisted always be in memory until rdd.unpersist()
?