0
votes

I keep seeing that Apache Spark schedules series of stages with a fixed 200 tasks involved. Since this keeps happening to a number of different jobs I am guessing this is somehow related to one of Spark configurations. Any suggestion what that configuration might be?

1

1 Answers

2
votes

200 is a default number of partitions used during shuffles and it is controlled by spark.sql.shuffle.partitions. Its value can set on runtime using SQLContext.setConf:

sqlContext.setConf("spark.sql.shuffle.partitions", "42")

or RuntimeConfig.set

spark.conf.set("spark.sql.shuffle.partitions", 42)