4
votes

I'm using apache spark 2.2.1, that running on Amazon EMR cluster. Sometimes jobs fail on 'Futures timed out':

java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:401)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:254)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:764)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:762)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)

I changed 2 params in spark-defaults.conf:

spark.sql.broadcastTimeout 1000
spark.network.timeout 10000000

but it didn't help.

Do you have any suggestions on how to handle this timeout?

1
By looking at the trace, this seems not related to broadcast hence spark.sql.broadcastTimeout 1000 may not help.jack

1 Answers

0
votes

Have you tried setting spark.yarn.am.waitTime?

Only used in cluster mode. Time for the YARN Application Master to wait for the SparkContext to be initialized.

The quote above is from here.

A bit more context on my situation:

I am using spark-submit to execute a java-spark job. I deploy the client to the cluster, and the client is doing a very long running operation which was causing a time out.

I got around it by:

spark-submit --master yarn --deploy-mode cluster --conf "spark.yarn.am.waitTime=600000"