Spark sql throws java.lang.OutOfMemoryError in yarn cluster mode but works in yarn client mode

Question

I have a simple hive query which works fine in yarn client mode using pyspark shell where as it throws me the below error when i run it in yarn-cluster mode.

Exception in thread "Thread-6" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-6"
Exception in thread "Reporter" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Reporter" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkDriver-scheduler-1"

Cluster information: Hadoop 2.4, Spark 1.4.0-hadoop2.4 ,hive 0.13.1 The script takes 10 columns from a hive table and does some transformations and writes it to a file.

> num-executors 200 executor-memory 8G driver-memory 16G executor-cores 3

Full stack trace:

py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o62.javaToPython.
: java.lang.OutOfMemoryError: PermGen space at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.getDeclaredMethods0(Native Method)
    at java.lang.Class.privateGetDeclaredMethods(Class.java:2570)
    at java.lang.Class.getDeclaredMethods(Class.java:1855)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:206)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:1891)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:683)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:682)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
    at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:682)
    at org.apache.spark.api.python.SerDeUtil$.javaToPython(SerDeUtil.scala:140)
    at org.apache.spark.sql.DataFrame.javaToPython(DataFrame.scala:1435)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)

Some information about cluster / local configuration and what is going inside your script would be useful. — zero323

Mark Rajcok Mark Rajcok · Accepted Answer · 2017-02-15T16:57:02

java.lang.OutOfMemoryError: PermGen space at java.lang.ClassLoader.defineClass1(...

You are likely running out of "permanent generation" heap space in the driver's JVM. This area is used to store classes. When we run in cluster mode, the JVM needs to load more classes (I think this is because the Application Manager runs inside the same JVM as the driver). To increase the PermGen area, add the following option:

--driver-java-options -XX:MaxPermSize=256M

See also https://plumbr.eu/outofmemoryerror/permgen-space

When using HiveContext in your Python program, I've found that the following option is also needed:

--files /usr/hdp/current/spark-client/conf/hive-site.xml

See also https://community.hortonworks.com/questions/27239/executing-spark-submit-with-yarn-cluster-mode-and.html

I've also wanted to specify a specific version of Python to use, which requires another option:

--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=/usr/local/bin/python2.7

See also https://issues.apache.org/jira/browse/SPARK-9235

Spark sql throws java.lang.OutOfMemoryError in yarn cluster mode but works in yarn client mode

2 Answers