1
votes

I have a multi-node cluster Spark cluster. I am creating logs using log4j. Logs are getting created but one all the nodes in the cluster. They are also getting created in /tmp directory and not on any other directory. This is

spark2-submit --master yarn --deploy-mode cluster --files /path/log4j.properties --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties" --class com.dataLoad.Load_WF /path/LoadData.jar

How to append all the logs in one log file instead of multiple logs? How to create logs in directory other than /tmp directory in Linux? Sample code will be very helpful for understanding. Much Appreciated.

1

1 Answers

0
votes

On a multinode spark cluster, the logs of your applications are written by the spark driver.

  • if you execute with client mode on node A, logs will be saved on node A

  • if you execute with cluster mode, logs will be saved on node where the spark driver is running.

We had the same problem, the solution we found is to use syslog to centralize the logs of each nodes, for all of our applications on the same node.

On the main node, you have to configure the syslog to be the logs server. Inside /etc/syslog-ng/, you have to edit syslog-ng.conf to create destinations to save the centralized log files, :

example :

@version: 3.5
@include "scl.conf"
@include "`scl-root`/system/tty10.conf"
    options {
        time-reap(30);
        mark-freq(10);
        keep-hostname(yes);
        };
    source s_local { system(); internal(); };

    source s_network {
        syslog(transport(udp) port(514));
        };

    destination df_local2 {
        file(
            "/var/log/MyClusterLogs/myAppLogs.$YEAR-$MONTH-$DAY.log"
            owner("user")
            group("user")
            perm(0777)
            ); };
    filter f_local2 { facility(local2); };
    log { source(s_network); filter(f_local2); destination(df_local2); };

And then, change the config in log4j.properties file of the spark application to point on syslog server :

log4j.rootCategory=INFO,FILE,SYSLOG
log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender
log4j.appender.SYSLOG.syslogHost=<syslog_server_ip>
log4j.appender.SYSLOG.layout=org.apache.log4j.PatternLayout
log4j.appender.SYSLOG.layout.conversionPattern=%d{ISO8601} %-5p [%t] %c{2} %x - %m%n
log4j.appender.SYSLOG.Facility=LOCAL2