5
votes

I'm trying to run a spark application written in scala 11.8, spark 2.1 on an EMR cluster version 5.3.0. I configured the cluster with the following json:

[
  {
    "Classification": "hadoop-env", 
    "Configurations": [
        {
            "Classification": "export", 
            "Configurations": [], 
            "Properties": {
                "JAVA_HOME": "/usr/lib/jvm/java-1.8.0"
            }
        }
    ], 
    "Properties": {}
  }, 
  {
    "Classification": "spark-env", 
    "Configurations": [
        {
            "Classification": "export", 
            "Configurations": [], 
            "Properties": {
                "JAVA_HOME": "/usr/lib/jvm/java-1.8.0"
            }
        }
    ], 
    "Properties": {}
  }
]

if i'm trying to run on a client mode everything run just fine. when trying to run the application with cluster mode it failed with status code 12.

Here is part of the master log where I see the status code:

17/02/01 10:08:26 INFO TaskSetManager: Finished task 79.0 in stage 0.0 (TID 79) in 293 ms on ip-10-234-174-231.us-west-2.compute.internal (executor 2) (78/11102) 17/02/01 10:08:27 INFO YarnAllocator: Driver requested a total number of 19290 executor(s). 17/02/01 10:08:27 INFO ApplicationMaster: Final app status: FAILED, exitCode: 12, (reason: Exception was thrown 1 time(s) from Reporter thread.) 17/02/01 10:08:27 INFO SparkContext: Invoking stop() from shutdown hook

UPDATE:

As part of the job I need to read some data from s3, something like this: sc.textFile( "s3n://stambucket/impressions/*/2017-01-0[1-9]/*/impression_recdate*) If I only take one day, there are no errors. But with 9 I get this 12 exit code. It's even weirder consider the fact that 9 days running on client mode just fine.

2
Googling suggests it means you're missing some jar files. This would fit with it working locally but not on a cluster. Check you've configured things so that the right jars are available everywhere, - The Archetypal Paul
emr-5.x already uses Java 8 by default, so your configurations are unnecessary. I don't think they would be the source of this problem, but I would recommend removing those configurations. - Jonathan Kelly
@TheArchetypalPaul If this indeed is the case, how am I supposed to know which jar are missing ?can you please share the reference you found. - NetanelRabinowitz
@JonathanKelly Your answer is a little surprise for me since the EMR doc For Spark says that if you are writing a driver for submission in cluster mode, the driver will use Java 7 but setting the environment can ensure that the executors use Java 8. and in order to do this i need to set this configuration. - NetanelRabinowitz
@NetanelRabinowitz I just searched for "exit code 12" spark. Pretty much all the references I found that way (there aren't many) suggested it was jars. I don't know how one goes about identifying which are missing. - The Archetypal Paul

2 Answers

2
votes

Exit code 12 is a standard exit code in linux to signal out of memory.

Spark set the default amount of memory to use per executor process to be 1gb. EMR won't override this value regardless the amount of memory available on the cluster's nodes/master. One possible fix is to set the maximizeResourceAllocation flag to true.

1
votes

Try to increase ApplicationMaster's Java heap spark.yarn.am.memory=2G or set maxExecutors to a reasonable value spark.dynamicAllocation.maxExecutors=400

https://issues.apache.org/jira/browse/SPARK-19226