I am trying to run a process using the Futures API of scala to run certain actions in parallel. Below is a sample code snippet
import scala.util._
import scala.concurrent._
import scala.concurrent.ExecutionContext.Implicits.global
object ConcurrentContext {
def def appMain(args: Array[String]) = {
// configure spark
val spark = SparkSession
.builder
.appName("parjobs")
.getOrCreate()
val pool = Executors.newFixedThreadPool(5)
// create the implicit ExecutionContext based on our thread pool
implicit val xc = ExecutionContext.fromExecutorService(pool)
/** Wraps a code block in a Future and returns the future */
def executeAsync[T](f: => T): Future[T] = {
Future(f)
}
}
My questions are:-
If I set executor-cores value to 4 which controls the number of threads per executor JVM and create a thread pool of 5 inside the application, which one would take precedence?
If I don't explicitly set the thread pool then the default
ExecutionContextwill create a default thread pool based on all the cores present on the machine from where the process is initiated (which would be the driver), in that situation how would the executor-core property impact?If the thread pool value takes precedence over executor-core and if I use the default value is there a possibility that there are many threads(equal to CPU cores) per JVM?
SparkSession(includingspark.executor.cores) only affects the parallelism of tasks Spark performs on a distributed data-structure (RDD, DataFrame, Dataset). Everything else you do, is on the driver machine itself and independent of Spark; and you can control its parallelism using ThreadPools etc. - suj1th