Apache Spark: Garbage Collection Logs for Driver

Question

My Spark driver runs out of memory after running for about 10 hours with the error Exception in thread "dispatcher-event-loop-17" java.lang.OutOfMemoryError: GC overhead limit exceeded. To further debug, I enabled G1GC mode and also the GC logs option using spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties -XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp but it looks like it is not taking effect on the driver.

The job got stuck again on the driver after 10 hours and I dont see any GC logs under stdout on the driver node under /var/log/hadoop-yar/userlogs/[application-id]/[container-id]/stdout - so not sure where else to look. According to Spark GC tuning docs, it looks like these settings only happen on worker nodes (which I can see in this case as well as workers have GC logs in stdout after I had used the same configs under spark.executor.extraJavaOptions). Is there anyway to enable/acquire GC logs from the driver? Under Spark UI -> Environment, I see these options are listed under spark.driver.extraJavaOptions which is why I assumed it would be working.

Environment: The cluster is running on Google Dataproc and I use /usr/bin/spark-submit --master yarn --deploy-mode cluster ... from the master to submit jobs.

EDIT Setting the same options for the driver during the spark-submit command works and I am able to see the GC logs on stdout for the driver. Just that setting the options via SparkConf programmatically does not seem to take effect for some reason.

Patrick Clay Patrick Clay · Accepted Answer · 2017-07-25T17:30:55

I believe spark.driver.extraJavaOptions is handled by SparkSubmit.scala, and needs to be passed at invocation. To do that with Dataproc you can add that to the properties field (--properties in gcloud dataproc jobs submit spark).

Also instead of -Dlog4j.configuration=log4j.properties you can use this guide to configure detailed logging.

I could see GC driver logs with: gcloud dataproc jobs submit spark --cluster CLUSTER_NAME --class org.apache.spark.examples.SparkPi --jars file:///usr/lib/spark/examples/jars/spark-examples.jar --driver-log-levels ROOT=DEBUG --properties=spark.driver.extraJavaOptions="-XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp" --

You probably don't need --driver-log-levels ROOT=DEBUG, but can copy in your logging config from log4j.properties. If you really want to use log4j.properties, you can probably use --files log4j.properties

Apache Spark: Garbage Collection Logs for Driver

1 Answers