Which directory spark applications on yarn output their logs to? spark.eventLog.dir or var/log/ in each node?

Question

I am building a log analysis planform to monitor spark jobs on a yarn cluster and I want to get a clear idea about spark/yarn logging. I have searched a lot about this and these are the confusions I have.

The directory specified in spark.eventLog.dir or spark.history.fs.logDirectory get stored all the application master logs and through log4j.properties in spark conf we can customize those logs ?
In default all data nodes output their executor logs to a folder in /var/log/. with log-aggregation enabled you can get those executer logs to the spark.eventLog.dir location as well?

I've managed to set up a 3 node virtual hadoop yarn cluster, spark installed in the master node. When I'm running spark in client mode I'm thinking this node becomes the application master node. I'm a beginner to Big data and appreciate any effort to help me out with these confusions.

Chris Chris · Accepted Answer · 2021-05-09T08:08:59

Spark log4j logging is written to the Yarn container stderr logs. The directory for these is controlled by yarn.nodemanager.log-dirs configuration parameter (default value on EMR is /var/log/hadoop-yarn/containers).

(spark.eventLog.dir is only used by the Spark History Server to display the Web UI after a job has finished. Here, Spark writes events that encode the information displayed in the UI to persisted storage).

Which directory spark applications on yarn output their logs to? spark.eventLog.dir or var/log/ in each node?

1 Answers