FileNotFoundExcepton when reading file from Hadoop distributed cache

Question

I am having issues running a Hadoop job, receiving a FileNotFoundException when trying to retrieve a file from the Distributed Cache, even though the file exists. When I run it on my local file system, it works.

The cluster is hosted on Amazon Web Services, using Hadoop version 1.0.4 and Java version 1.7. I don't have any control over the cluster, or how it's set up.

In the main function I add the file to the distributed cache. This seems to work fine. I think, at least it's not throwing any exceptions.

....
JobConf conf = new JobConf(Driver.class);
conf.setJobName("mean");
conf.set("lookupfile", args[2]);
Job job = new Job(conf);
DistributedCache.addCacheFile(new Path(args[2]).toUri(), conf);
...

In the Setup function being called before Map I create a Path for the file, and call a function that loads the file into a hash map.

Configuration conf = context.getConfiguration();
String inputPath = conf.get("lookupfile");                          
Path dataFile = new Path(inputPath);
loadHashMap(dataFile, context);

The exception occurs on the first line of the function that loads the hash map.

brReader = new BufferedReader(new FileReader(filePath.toString()));

I start the job like so.

hadoop jar Driver.jar Driver /tmp/input output /tmp/DATA.csv

I get the following error

Error: Found class org.apache.hadoop.mapreduce.Counter, but interface was expected
attempt_201410300715_0018_m_000000_0: java.io.FileNotFoundException: /tmp/DATA.csv (No such file or directory)
attempt_201410300715_0018_m_000000_0:   at java.io.FileInputStream.open(Native Method)
attempt_201410300715_0018_m_000000_0:   at java.io.FileInputStream.<init>(FileInputStream.java:146)
attempt_201410300715_0018_m_000000_0:   at java.io.FileInputStream.<init>(FileInputStream.java:101)
attempt_201410300715_0018_m_000000_0:   at java.io.FileReader.<init>(FileReader.java:58)
attempt_201410300715_0018_m_000000_0:   at Map.loadHashMap(Map.java:49)
attempt_201410300715_0018_m_000000_0:   at Map.setup(Map.java:98)
attempt_201410300715_0018_m_000000_0:   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
attempt_201410300715_0018_m_000000_0:   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
attempt_201410300715_0018_m_000000_0:   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
attempt_201410300715_0018_m_000000_0:   at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
attempt_201410300715_0018_m_000000_0:   at java.security.AccessController.doPrivileged(Native Method)
attempt_201410300715_0018_m_000000_0:   at javax.security.auth.Subject.doAs(Subject.java:415)
attempt_201410300715_0018_m_000000_0:   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1140)
attempt_201410300715_0018_m_000000_0:   at org.apache.hadoop.mapred.Child.main(Child.java:253)
14/11/01 02:12:49 INFO mapred.JobClient: Task Id : attempt_201410300715_0018_m_000001_0, Status : FAILED

I've verified that the file exists, both in HDFS and on the local file system.

hadoop@hostname:~$ hadoop fs -ls /tmp
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2014-10-30 11:19 /tmp/input
-rw-r--r--   1 hadoop supergroup     428796 2014-10-30 11:19 /tmp/DATA.csv

hadoop@hostname:~$ ls -al /tmp/
-rw-r--r--  1 hadoop hadoop 428796 Oct 30 11:30 DATA.csv

I honestly can't see what's wrong here. The exception lists the correct path for the file. I've verified that the file exists on both HDFS and the local file system. Is there something I am missing here?

shiva shiva · Accepted Answer · 2015-03-24T17:17:26

The input to BufferedReader should be coming from path returned by DistributedCache.getLocalCacheFiles() in Setup(). More like..

Path[] localFiles = DistributedCache.getLocalCacheFiles();
if (localFiles.length > 0){
   brReader = new BufferedReader(new FileReader(localFiles[0].toString());      
}

FileNotFoundExcepton when reading file from Hadoop distributed cache

2 Answers