You partition, and repartition for following reasons:
- To make sure we have enough work to distribute to the distinct cores in our cluster (nodes * cores_per_node). Obviously we need to tune the number of executors, cores per executor, and memory per executor to make that happen as intended.
- To make sure we evenly distribute work: the smaller the partitions, the lesser the chance than one core might have much more work to do than all other cores. Skewed distribution can have a huge effect on total lapse time if the partitions are too big.
- To keep partitions in managable sizes. Not to big, and not to small so we dont overtax GC. Also bigger partitions might have issues when we have non-linear O.
- To small partitions will create too much process overhead.
As you might have noticed, there will be a goldilocks zone. Testing will help you determine ideal partition size.
Note that it is ok to have much more partitions than we have cores. Queuing partitions to be assigned a task is something that I design for.
Also make sure you configure your spark job properly otherwise:
- Make sure you do not have too many executors. One or Very Few executors per node is more than enough. Fewer executors will have less overhead, as they work in shared memory space, and individual tasks are handled by threads instead of processes. There is a huge amount of overhead to starting up a process, but Threads are pretty lightweight.
- Tasks need to talk to each other. If they are in the same executor, they can do that in-memory. If they are in different executors (processes), then that happens over a socket (overhead). If that is over multiple nodes, that happens over a traditional network connection (more overhead).
- Assign enough memory to your executors. When using Yarn as the scheduler, it will fit the executors by default by their memory, not by the CPU you declare to use.
I do not know what your situation is (you made the node names invisible), but if you only have a single node with 15 cores, then 16 executors do not make sense. Instead, set it up with One executor, and 16 cores per executor.
Yarnyou can set the resources manager toFAIR- Alberto Bonsanto