Question
Is there "spark.default.parallelism" equivalent for SparkSQL DataFrame for the narrrow transformation (map, filter, etc)?
Background
Apparently, parition control is different between RDD and DataFrame. Dataframe has spark.sql.shuffle.partitions to control partitions for shuffing (wide transformation if I understand correctly) and "spark.default.parallelism" would have no effect.
How Spark dataframe shuffling can hurt your partitioning
But what does shuffling have to do with partitioning? Well, nothing really if you are working with RDDs…but with dataframes, that’s a different story. ... As you can see the partition number suddenly increases. This is due to the fact that the Spark SQL module contains the following default configuration: spark.sql.shuffle.partitions set to 200.
The article below suggests spark.default.parallelism would not work for Dataframe.
What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism?
The spark.default.parallelism is the default number of partitions in RDDs returned by transformations like join, reduceByKey, and parallelize when not set explicitly by the user. But the spark.default.parallelism seems to only be working for raw RDD and is ignored when working with data frames.