0
votes

Purpose - Store custom log from streaming app to HDFS or UNIX directory for streaming application

I am running spark streaming program in cluster mode.But logs are not getting written to given log path. checked both HDFS and Local directory.By log4j debug property i can see files in action. Am i missing something?

--files log4j_driver.properties
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j_driver.properties -Dlog4j.debug=true "
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j_driver.properties -Dlog4j.debug=true"

Log4j-Property file

My Log4j properties file -

log=/tmp/cc

log4j.rootLogger=INFO,rolling
log4j.appender.rolling=org.apache.log4j.RollingFileAppender
log4j.appender.rolling.File=${log}/abc.log
log4j.appender.rolling.layout=org.apache.log4j.PatternLayout
log4j.appender.rolling.layout.conversionPattern=[%d] %p %m (%c)%n
log4j.appender.rolling.maxFileSize=2KB
log4j.appender.rolling.maxBackupIndex=10
log4j.appender.rolling.encoding=UTF-8
log4j.logger.org.apache.spark=INFO
log4j.appender.rolling.ImmediateFlush=true
log4j.appender.rolling.Threshold=debug
log4j.appender.rolling.Append=true
log4j.logger.org.eclipse.jetty=INFO

Cluster Driver Log

log4j: Renaming file /tmp/cc/abc.log.2 to /tmp/cc/abc.log.3
log4j: Renaming file /tmp/cc/abc.log.1 to /tmp/cc/abc.log.2
log4j: Renaming file /tmp/cc/abc.log to /tmp/cc/abc.log.1
log4j: setFile called: /tmp/cc/abc.log, false
log4j: setFile ended
log4j: rolling over count=5141
log4j: maxBackupIndex=10
log4j: Renaming file /tmp/cc/abc.log.9 to /tmp/cc/abc.log.10
log4j: Renaming file /tmp/cc/abc.log.8 to /tmp/cc/abc.log.9
log4j: Renaming file /tmp/cc/abc.log.7 to /tmp/cc/abc.log.8
log4j: Renaming file /tmp/cc/abc.log.6 to /tmp/cc/abc.log.7

I read- We can specify - ${spark.yarn.app.container.log.dir}/app.log in log4j but not sure what is the default path for this property or if we need to set manually then as well . When i was runninng this application in client mode - logs are perfectly getting logged to local directory.

3
"I am running spark streaming program in cluster mode" <-- Can you show the command line that you use to execute the Spark app? - Jacek Laskowski
spark2-submit --master yarn --deploy-mode cluster - Elvish_Blade

3 Answers

0
votes

In my yarn cluster,the log of spark streaming application is written on the node of the application container.Actually there’s a dictionary for writing log that belongs to the application and it’s configured by a field named yarn.log.directory?. I don’t remember the precise name so you can check it out.

0
votes

When you start a Spark application in cluster mode (--deploy-mode cluster), log=/tmp/cc points to /tmp/cc that is under the root of the "containers" that run the driver and the executor. They will be on the machines in the cluster.

In your case, you have to find the machines that run the driver and executors and find the directory.

Since it's a very cumbersome to manage logs in distributed environment as Spark, the cluster managers supported by Spark (i.e. Hadoop YARN, Apache Mesos, Kubernetes) allow for collecting the logs from the machines and make it available through a web UI or command-line to download. In YARN, it'd be yarn logs -applicationId.

-2
votes

best option to find where spark logs are written is using Spark UI and in cluster mode driver logs are in one of the cluster node.

Spark UI gives lots of info. http://ashkrit.blogspot.com/2018/11/insights-from-spark-ui.html post has some of details.