0
votes

I have a spark application that runs as expected on one node.

I am now using yarn to run this across multiple nodes. However, this is failing with a file not found exception. I first changed this file path from relative to absolute path but the error persisted. I then read here that it may be necessary to prefix the path with file:// in case the default is for HDFS. This file type in question is json.

Despite using the absolute path and prefixing with file, this error persists:

16/11/10 10:19:56 INFO yarn.Client: client token: N/A diagnostics: User class threw exception: java.io.FileNotFoundException: file://absolute/dir/file.json (No such file or directory)

Why does this work correctly with one node but not in cluster mode with yarn?

1
is this file present in all the nodes?Nirmal Ram
No, it is present on one node. I had also tried the node address, so file://me@server/dir/file.jsonLearningSlowly
You'd either need you file to be on a distributed fs like HDFS or on all worker nodes of your cluster under the same location.facha

1 Answers

0
votes

You're missing a slash /. Try:

file:///absolute/dir/file.json

The file:// prefix here specifies the NFS file system, and you need to specify the absolute path from there beginning with a forward slash, requiring three forward slashes in total.