When I run my Spark app using sbt run
with configuration pointing to master of a remote cluster nothing useful gets executed by the workers and the following warning is printed in sbt run
log repeatedly.
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
This is how my spark config looks like:
@transient lazy val conf: SparkConf = new SparkConf()
.setMaster("spark://master-ip:7077")
.setAppName("HelloWorld")
.set("spark.executor.memory", "1g")
.set("spark.driver.memory", "12g")
@transient lazy val sc: SparkContext = new SparkContext(conf)
val lines = sc.textFile("hdfs://master-public-dns:9000/test/1000.csv")
I know this warning usually appears when the cluster is misconfigured and the workers either don't have the resources or aren't started in the first place. However, according to my Spark UI (on master-ip:8080) the worker nodes seem to be alive with sufficient RAM and cpu cores, they even try to execute my app but they exit and leave this in stderr
log:
INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled;
users with view permissions: Set(ubuntu, myuser);
groups with view permissions: Set(); users with modify permissions: Set(ubuntu, myuser); groups with modify permissions: Set()
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
...
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply from 192.168.0.11:35996 in 120 seconds
... 8 more
ERROR RpcOutboxMessage: Ask timeout before connecting successfully
Any ideas?