I'm working on AWS EC2 instance where I installed Spark 2.2.0 and I have 8 GB of RAM and 2 cores.
I was following this tutorial to play a little with pyspark shell:
https://sparkour.urizone.net/recipes/managing-clusters/
I started the master and I started one slave worker and they show up on the web ui.
However, in the shell, when I try to execute a command like:
>>> tf = spark.sparkContext.textFile('README.md')
>>> tf.count()
I get this:
[Stage 0:> (0 + 0) / 2]
17/08/29 11:02:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
In my spark-env.sh, I set variables like this:
SPARK_LOCAL_IP=127.0.0.1
SPARK_MASTER_HOST=127.0.0.1
SPARK_WORKER_INSTANCES=2
SPARK_WORKER_MEMORY=1000m
SPARK_WORKER_CORES=1
So, I don't know why there is a problem. The pyspark shell doesn't reach the worker slave properly I guess.