Spark shell not executing 'Initial job has not accepted any resources'

Question

I'm working on AWS EC2 instance where I installed Spark 2.2.0 and I have 8 GB of RAM and 2 cores.

I was following this tutorial to play a little with pyspark shell:
https://sparkour.urizone.net/recipes/managing-clusters/

I started the master and I started one slave worker and they show up on the web ui.

However, in the shell, when I try to execute a command like:

>>> tf = spark.sparkContext.textFile('README.md')
>>> tf.count()

I get this:

[Stage 0:> (0 + 0) / 2]
17/08/29 11:02:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

In my spark-env.sh, I set variables like this:

SPARK_LOCAL_IP=127.0.0.1
SPARK_MASTER_HOST=127.0.0.1
SPARK_WORKER_INSTANCES=2
SPARK_WORKER_MEMORY=1000m
SPARK_WORKER_CORES=1

So, I don't know why there is a problem. The pyspark shell doesn't reach the worker slave properly I guess.

I modified spark-defaults.conf to add 'spark.driver.memory 3g' but it still doesn't work — Ahmedn1

Michel Lemay Michel Lemay · Accepted Answer · 2017-08-29T11:27:11

In this setup I would start spark with settings like this:

spark-shell (or spark-submit) --master local[*] --driver-memory 4G ...

From one of my comments:

With such a small machine, I suspect you won't be able to run on cluster mode. The thing is that the spark driver takes resources as well as the two other workers. In this scenario, you have 1 core driver + 2 workers * 1 core . You could try to resize down the number of workers to 1 and that should work.

Spark shell not executing 'Initial job has not accepted any resources'

1 Answers