10
votes

I am fairly new to Spark . I tried searching but I couldn't get a proper solution . I have installed hadoop 2.7.2 on two boxes ( one master node and the other worker node) I have setup the cluster by following the below link http://javadev.org/docs/hadoop/centos/6/installation/multi-node-installation-on-centos-6-non-sucure-mode/ I was running hadoop and spark application as root user for testing the cluster.

I have installed the spark on the master node and spark is starting without any errors . However when I submit the job using spark submit I am getting File Not Found exception even though the file is present in the master node in the very same location in the error.I am executing below Spark Submit command and please find the logs output below the command.

/bin/spark-submit  --class com.test.Engine  --master yarn --deploy-mode      cluster /app/spark-test.jar
16/04/21 19:16:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/04/21 19:16:13 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/04/21 19:16:14 INFO Client: Requesting a new application from cluster with 1 NodeManagers
16/04/21 19:16:14 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/04/21 19:16:14 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/04/21 19:16:14 INFO Client: Setting up container launch context for our AM
16/04/21 19:16:14 INFO Client: Setting up the launch environment for our AM container
16/04/21 19:16:14 INFO Client: Preparing resources for our AM container
16/04/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file:/mi/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar
16/04/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file:/app/spark-test.jar
16/04/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file:/tmp/spark-120aeddc-0f87-4411-9400-22ba01096249/__spark_conf__5619348744221830008.zip
16/04/21 19:16:14 INFO SecurityManager: Changing view acls to: root
16/04/21 19:16:14 INFO SecurityManager: Changing modify acls to: root
16/04/21 19:16:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/04/21 19:16:15 INFO Client: Submitting application 1 to ResourceManager
16/04/21 19:16:15 INFO YarnClientImpl: Submitted application application_1461246306015_0001
16/04/21 19:16:16 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
16/04/21 19:16:16 INFO Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1461246375622
     final status: UNDEFINEDsparkcluster01.testing.com
     tracking URL: http://sparkcluster01.testing.com:8088/proxy/application_1461246306015_0001/
     user: root
16/04/21 19:16:17 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
16/04/21 19:16:18 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
16/04/21 19:16:19 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
16/04/21 19:16:20 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
16/04/21 19:16:21 INFO Client: Application report for application_1461246306015_0001 (state: FAILED)
16/04/21 19:16:21 INFO Client: 
     client token: N/A
     diagnostics: Application application_1461246306015_0001 failed 2 times due to AM Container for appattempt_1461246306015_0001_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://sparkcluster01.testing.com:8088/cluster/app/application_1461246306015_0001Then, click on links to logs of each attempt.
Diagnostics: java.io.FileNotFoundException: File file:/app/spark-test.jar does not exist
Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1461246375622
     final status: FAILED
     tracking URL: http://sparkcluster01.testing.com:8088/cluster/app/application_1461246306015_0001
     user: root
Exception in thread "main" org.ap/app/spark-test.jarache.spark.SparkException: Application application_1461246306015_0001 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I even tried running the spark on HDFS file system by placing my application on HDFS and giving the HDFS path in the Spark Submit command. Even then its throwing File Not Found Exception on some Spark Conf file. I am executing below Spark Submit command and please find the logs output below the command.

 ./bin/spark-submit  --class com.test.Engine  --master yarn --deploy-mode cluster hdfs://sparkcluster01.testing.com:9000/beacon/job/spark-test.jar
16/04/21 18:11:45 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/04/21 18:11:46 INFO Client: Requesting a new application from cluster with 1 NodeManagers
16/04/21 18:11:46 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/04/21 18:11:46 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/04/21 18:11:46 INFO Client: Setting up container launch context for our AM
16/04/21 18:11:46 INFO Client: Setting up the launch environment for our AM container
16/04/21 18:11:46 INFO Client: Preparing resources for our AM container
16/04/21 18:11:46 INFO Client: Source and destination file systems are the same. Not copying file:/mi/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar
16/04/21 18:11:47 INFO Client: Uploading resource hdfs://sparkcluster01.testing.com:9000/beacon/job/spark-test.jar -> file:/root/.sparkStaging/application_1461234217994_0017/spark-test.jar
16/04/21 18:11:49 INFO Client: Source and destination file systems are the same. Not copying file:/tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21/__spark_conf__6818051470272245610.zip
16/04/21 18:11:50 INFO SecurityManager: Changing view acls to: root
16/04/21 18:11:50 INFO SecurityManager: Changing modify acls to: root
16/04/21 18:11:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/04/21 18:11:50 INFO Client: Submitting application 17 to ResourceManager
16/04/21 18:11:50 INFO YarnClientImpl: Submitted application application_1461234217994_0017
16/04/21 18:11:51 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED)
16/04/21 18:11:51 INFO Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1461242510849
     final status: UNDEFINED
     tracking URL: http://sparkcluster01.testing.com:8088/proxy/application_1461234217994_0017/
     user: root
16/04/21 18:11:52 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED)
16/04/21 18:11:53 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED)
16/04/21 18:11:54 INFO Client: Application report for application_1461234217994_0017 (state: FAILED)
16/04/21 18:11:54 INFO Client: 
     client token: N/A
     diagnostics: Application application_1461234217994_0017 failed 2 times due to AM Container for appattempt_1461234217994_0017_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://sparkcluster01.testing.com:8088/cluster/app/application_1461234217994_0017Then, click on links to logs of each attempt.
Diagnostics: File file:/tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21/__spark_conf__6818051470272245610.zip does not exist
java.io.FileNotFoundException: File file:/tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21/__spark_conf__6818051470272245610.zip does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
    at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1461242510849
     final status: FAILED
     tracking URL: http://sparkcluster01.testing.com:8088/cluster/app/application_1461234217994_0017
     user: root
Exception in thread "main" org.apache.spark.SparkException: Application application_1461234217994_0017 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/04/21 18:11:55 INFO ShutdownHookManager: Shutdown hook called
16/04/21 18:11:55 INFO ShutdownHookManager: Deleting directory /tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21
4
It's nice you included all this information, but wouldn't the spark code itself be useful to diagnose the problem?OneCricketeer
@cricket_007 The issue is not due to spark code because even if I run spark shell or any spark examples itself using Yarn I am receiving the same error. for eg: spark-shell --master yarn-clientDev Loper
Could add your yarn logs please. you could get them by doing $ yarn logs -applicationId application_1461246306015_0001user1314742
@ user1314742 , I think the spark application itself is not being executed. I am getting this exception soon after i submit my job. When I run yarn logs command it says log aggregation for this job has not started yet. I think it has to do something do with my spark & hadoop setup / configuration .Dev Loper
@cricket_007 that's how spark prints the file location. I has nothing to with the error. My hadoop configration directory was pointing to wrong location which resulted in this issue.Dev Loper

4 Answers

8
votes

The spark configuration was not pointing to the right hadoop Configuration directory. The hadoop configuration for 2.7.2 is residing at file path hadoop 2.7.2./etc/hadoop/ rather than /root/hadoop2.7.2/conf. When i pointed HADOOP_CONF_DIR=/root/hadoop2.7.2/etc/hadoop/ under spark-env.sh the spark submit started working and File not found exception disappeared. Earlier it was pointing to /root/hadoop2.7.2/conf (which does not exits). If spark does not points to proper hadoop configuration directory it might results in similar error. I think its probably a bug in spark , it should handle it gracefully rather than throwing ambiguous error messages .

0
votes

I have got a similar error with Spark running on EMR. I have written my spark code in Java 8, and in EMR cluster spark runs ,by default, on Java 8. Then I had to recreate the cluster with JAVA_HOME pointing to the java 8 version. It has resolved my problem. Please check on the similar lines.

0
votes

I had similar issue but the problem was related to having two core-site.xml one in $HADOOP_CONF_DIR and other in $SPARK_HOME/conf. The problem disappeared when I removed the one under $SPARK_HOME/conf

0
votes

Whenever you run in yarn cluster mode, the local file should be placed in all the nodes. Because, we don't know which node will be a AM(Application Master) node. Your application always looks a file from the AM node.

I had a situation where I have to keep KeyStore and KeyPass passwords in a file, which is read by my spark job, during runtime. I kept the file under /opt folder as opt/datapipeline/config/keystorePass. But my application keep on failing with FileNotFoundException.

After placing keystorePass file in all the nodes, the exception gone and job succeeded.

Other way can do is, keep the file in hdfs file system rather than local file system