1
votes

I am building a log analysis planform to monitor spark jobs on a yarn cluster and I want to get a clear idea about spark/yarn logging. I have searched a lot about this and these are the confusions I have.

  1. The directory specified in spark.eventLog.dir or spark.history.fs.logDirectory get stored all the application master logs and through log4j.properties in spark conf we can customize those logs ?

  2. In default all data nodes output their executor logs to a folder in /var/log/. with log-aggregation enabled you can get those executer logs to the spark.eventLog.dir location as well?

I've managed to set up a 3 node virtual hadoop yarn cluster, spark installed in the master node. When I'm running spark in client mode I'm thinking this node becomes the application master node. I'm a beginner to Big data and appreciate any effort to help me out with these confusions.

1

1 Answers

1
votes

Spark log4j logging is written to the Yarn container stderr logs. The directory for these is controlled by yarn.nodemanager.log-dirs configuration parameter (default value on EMR is /var/log/hadoop-yarn/containers).

(spark.eventLog.dir is only used by the Spark History Server to display the Web UI after a job has finished. Here, Spark writes events that encode the information displayed in the UI to persisted storage).