1
votes

I have a properties file, that I am submitting to spark-submit using --files in yarn cluster mode.

[[email protected]]$ cat testprop.prop
name:aiman
country:india

I intend to read the property value from this file and display it on screen using log4j logger.
I am using the following to submit the job with --files

spark-submit \
--class org.main.ReadLocalFile \
--master yarn \
--deploy-mode cluster \
--files testprop.prop#testprop.prop \
spark_cluster_file_read-0.0.1.jar

The job is running to completion with SUCCEEDED message, but I am unable to see output on the console.
I am able to read the testprop.prop file and display the output when running in client mode but when running in cluster mode I am not able to. I guess, the logging to console is not working in cluster mode. How should I then progress to log to console?
Here is the code I am using:

package org.main;

import java.io.InputStream;
import java.util.Properties;

import org.apache.log4j.LogManager;
import org.apache.log4j.Logger;
import org.apache.spark.sql.SparkSession;
import org.xml.sax.InputSource;
import scala.xml.Source;

public class ReadLocalFile {
    public static void main(String args[]) throws Exception
    {
        final Logger log = LogManager.getLogger(ReadLocalFile.class);
        ConsoleAppender logConsole = new ConsoleAppender();
        log.addAppender(logConsole);
        SparkSession spark = SparkSession.builder().master("yarn").config("spark.submit.deployMode", "cluster").getOrCreate();
        Properties prop = new Properties();
        InputStream in = null;
        try{
            InputSource propFile = Source.fromFile("testprop.prop");
            in = propFile.getByteStream();
            prop.load(in);
        }
        catch(Exception e){
            e.printStackTrace();
            log.error("=========Exception Thrown============");
            System.exit(1);
        }  

        log.info("====================Value: "+prop.getProperty("name"));
        spark.close();
    }

}

And the logs are:

SPARK_MAJOR_VERSION is set to 2, using Spark2
19/07/25 07:59:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/07/25 07:59:51 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
19/07/25 07:59:51 INFO O: Set a new configuration for the first time.
19/07/25 07:59:51 INFO d: Method not implemented in this version of Hadoop: org.apache.hadoop.fs.FileSystem$Statistics.getBytesReadLocalHost
19/07/25 07:59:51 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
19/07/25 07:59:51 INFO u: Scheduling statistics report every 2000 millisecs
19/07/25 07:59:52 INFO RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
19/07/25 07:59:52 INFO RequestHedgingRMFailoverProxyProvider: Found active RM [rm2]
19/07/25 07:59:52 INFO Client: Requesting a new application from cluster with 24 NodeManagers
19/07/25 07:59:52 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (102400 MB per container)
19/07/25 07:59:52 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
19/07/25 07:59:52 INFO Client: Setting up container launch context for our AM
19/07/25 07:59:52 INFO Client: Setting up the launch environment for our AM container
19/07/25 07:59:52 INFO Client: Preparing resources for our AM container
19/07/25 07:59:52 INFO HadoopFSCredentialProvider: getting token for: hdfs://meldstg/user/myserviceuser
19/07/25 07:59:52 INFO DFSClient: Created HDFS_DELEGATION_TOKEN token 7451415 for myserviceuser on ha-hdfs:meldstg
19/07/25 07:59:54 INFO metastore: Trying to connect to metastore with URI thrift://XXX.XXX.XXX:9083
19/07/25 07:59:54 INFO metastore: Connected to metastore.
19/07/25 07:59:55 INFO HiveCredentialProvider: Get Token from hive metastore: Kind: HIVE_DELEGATION_TOKEN, Service: , Ident: 00 1a 65 62 64 70 62 75 73 73 40 43 41 42 4c 45 2e 43 4f 4d 43 41 53 54 2e 43 4f 4d 04 68 69 76 65 00 8a 01 6c 28 24 c8 e0 8a 01 6c 4c 31 4c e0 8e 82 98 8e 03 08
19/07/25 07:59:55 INFO Client: Use hdfs cache file as spark.yarn.archive for HDP, hdfsCacheFile:hdfs://meldstg/hdp/apps/2.6.3.20-2/spark2/spark2-hdp-yarn-archive.tar.gz
19/07/25 07:59:55 INFO Client: Source and destination file systems are the same. Not copying hdfs://meldstg/hdp/apps/2.6.3.20-2/spark2/spark2-hdp-yarn-archive.tar.gz
19/07/25 07:59:55 INFO Client: Uploading resource file:/home/myserviceuser/aiman/spark_cluster_file_read-0.0.1-SNAPSHOT-jar-with-dependencies.jar -> hdfs://meldstg/user/myserviceuser/.sparkStaging/application_1563540853319_78111/spark_cluster_file_read-0.0.1-SNAPSHOT-jar-with-dependencies.jar
19/07/25 07:59:56 INFO Client: Uploading resource file:/home/myserviceuser/aiman/testprop.prop#testprop.prop -> hdfs://meldstg/user/myserviceuser/.sparkStaging/application_1563540853319_78111/testprop.prop
19/07/25 07:59:56 INFO Client: Uploading resource file:/tmp/spark-bcf53d4d-1bac-47f4-87d6-2e35c0e8b501/__spark_conf__7386751978371777143.zip -> hdfs://meldstg/user/myserviceuser/.sparkStaging/application_1563540853319_78111/__spark_conf__.zip
19/07/25 07:59:56 INFO SecurityManager: Changing view acls to: myserviceuser
19/07/25 07:59:56 INFO SecurityManager: Changing modify acls to: myserviceuser
19/07/25 07:59:56 INFO SecurityManager: Changing view acls groups to:
19/07/25 07:59:56 INFO SecurityManager: Changing modify acls groups to:
19/07/25 07:59:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(myserviceuser); groups with view permissions: Set(); users  with modify permissions: Set(myserviceuser); groups with modify permissions: Set()
19/07/25 07:59:56 INFO Client: Submitting application application_1563540853319_78111 to ResourceManager
19/07/25 07:59:56 INFO YarnClientImpl: Submitted application application_1563540853319_78111
19/07/25 07:59:57 INFO Client: Application report for application_1563540853319_78111 (state: ACCEPTED)
19/07/25 07:59:57 INFO Client:
         client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
         diagnostics: AM container is launched, waiting for AM container to Register with RM
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: orion
         start time: 1564041596720
         final status: UNDEFINED
         tracking URL: http://XXXX.XXXX.XXX/proxy/application_1563540853319_78111/
         user: myserviceuser
19/07/25 07:59:58 INFO Client: Application report for application_1563540853319_78111 (state: ACCEPTED)
19/07/25 07:59:59 INFO Client: Application report for application_1563540853319_78111 (state: ACCEPTED)
19/07/25 08:00:00 INFO Client: Application report for application_1563540853319_78111 (state: ACCEPTED)
19/07/25 08:00:01 INFO Client: Application report for application_1563540853319_78111 (state: ACCEPTED)
19/07/25 08:00:02 INFO Client: Application report for application_1563540853319_78111 (state: ACCEPTED)
19/07/25 08:00:03 INFO Client: Application report for application_1563540853319_78111 (state: ACCEPTED)
19/07/25 08:00:04 INFO Client: Application report for application_1563540853319_78111 (state: RUNNING)
19/07/25 08:00:04 INFO Client:
         client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
         diagnostics: N/A
         ApplicationMaster host: XXX.XXX.XXX.XXX
         ApplicationMaster RPC port: 0
         queue: orion
         start time: 1564041596720
         final status: UNDEFINED
         tracking URL: http://XXXX.XXXX.XXX/proxy/application_1563540853319_78111/
         user: myserviceuser
19/07/25 08:00:05 INFO Client: Application report for application_1563540853319_78111 (state: RUNNING)
19/07/25 08:00:06 INFO Client: Application report for application_1563540853319_78111 (state: RUNNING)
19/07/25 08:00:07 INFO Client: Application report for application_1563540853319_78111 (state: RUNNING)
19/07/25 08:00:08 INFO Client: Application report for application_1563540853319_78111 (state: RUNNING)
19/07/25 08:00:09 INFO Client: Application report for application_1563540853319_78111 (state: RUNNING)
19/07/25 08:00:10 INFO Client: Application report for application_1563540853319_78111 (state: RUNNING)
19/07/25 08:00:11 INFO Client: Application report for application_1563540853319_78111 (state: RUNNING)
19/07/25 08:00:12 INFO Client: Application report for application_1563540853319_78111 (state: RUNNING)
19/07/25 08:00:13 INFO Client: Application report for application_1563540853319_78111 (state: FINISHED)
19/07/25 08:00:13 INFO Client:
         client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
         diagnostics: N/A
         ApplicationMaster host: XXX.XXX.XXX.XXX
         ApplicationMaster RPC port: 0
         queue: orion
         start time: 1564041596720
         final status: SUCCEEDED
         tracking URL: http://XXXX.XXXX.XXX/proxy/application_1563540853319_78111/
         user: myserviceuser
19/07/25 08:00:14 INFO ShutdownHookManager: Shutdown hook called
19/07/25 08:00:14 INFO ShutdownHookManager: Deleting directory /tmp/spark-bcf53d4d-1bac-47f4-87d6-2e35c0e8b501

Where am I going wrong?

1
Have you tried reading with SparkFiles.get("testprop.prop") which gives you the path of file - koiralo
I am able to read the file using Source.fromFile(). The issue is I am not able display it to the console. Let me modify the heading of this post. - aiman
if you are running on cluster mode, go to spark hiostory server and look at STDout there is where it would be printed along with any other println statements. I have not seen log print out on console for spark application in cluster mode. - Aaron

1 Answers

3
votes

You can't print to console in cluster mode because the driver will likely never be on the same node that the application is launched. You will have to check logs in yarn/resource manager history.