0
votes

How can I run 2x spark submits in the same time? I have a simple spark (no extra configurations on my pc) with 4 cores allocated.

If I try to submit an app 2x times, the second one gets "WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources"

Code: from future import print_function

import sys
from operator import add

from pyspark.sql import SparkSession

if __name__ == "__main__":

    spark = SparkSession\
        .builder\
        .appName("test")\
        .getOrCreate()

    rdd = spark.sparkContext.parallelize(xrange(1000000000), 100)
    print(rdd.sample(False, 0.1, 81).count())

    spark.stop()

How I try to start them: ./spark-submit --master spark://myaddresshere:7077 --name "app1" --conf spark.shuffle.service.enabled=true --conf park.shuffle.service.enabled=true /path_to_py_file.py

I know that I can pre-set the number of cores to use, but my purpose is to dynamically allocate the resources. If there is only 1 task running => consume 100%, if they are 4 tasks => 25% each.

I've tried multiple options but without luck.

Any hint will be appreciated.

1
Spark dynamic allocation works only on executor level, which means it will allocate more executors when tasks are piling in the queue. If additional executors can be allocated depends on their setup and the available resources. Also keep in mind that when running locally, the drivers will also need cores. - LiMuBei

1 Answers

0
votes

It looks like you are running locally so there is no resource manager like Yarn to distribute resources. your app probably runs with

val conf = new SparkConf()
  .setMaster(local[*])

Which tells spark to use all cores.. you can't use a dynamic value here that will depend on future submits.

What your asking for is a resource manager that will distribute resources evenly between applications... I don't know of anything that offers that exactly.

a configuration like dynamic allocation will allow a resource manager to give the app resources according to its need but this will not necessarily be 50% for 2 apps (probably won't).

To my knowledge you have no choice but to "tell" Spark how many executors to use and how many cores for each executor (using spark-submit arguments or spark-defaults configurations) so that the resources will be evenly distributed.