In Pyspark, I can create a RDD from a list and decide how many partitions to have:
sc = SparkContext()
sc.parallelize(xrange(0, 10), 4)
How does the number of partitions I decide to partition my RDD in influence the performance? And how does this depend on the number of core my machine has?