I am running spark in cluster mode and reading data from RDBMS via JDBC.
As per Spark docs, these partitioning parameters describe how to partition the table when reading in parallel from multiple workers:
partitionColumn
lowerBound
upperBound
numPartitions
These are optional parameters.
What would happen if I don't specify these:
- Only 1 worker read the whole data?
- If it still reads parallelly, how does it partition data?