0
votes

Inside my map function, I am trying to read a file from the distributedcache, load its contents into a hash map.

The sys output log of the MapReduce job prints the content of the hashmap. This shows that it has found the file, has loaded into the data structure and performed the needed operation. It iterates through the list and prints its contents. Thus proving that the operation was successful.

However, I still get the below error after a few minutes of running the MR job:

13/01/27 18:44:21 INFO mapred.JobClient: Task Id : attempt_201301271841_0001_m_000001_2, Status : FAILED
java.io.FileNotFoundException: File does not exist: /app/hadoop/jobs/nw_single_pred_in/predict
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1843)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.(DFSClient.java:1834)
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)
    at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

Here's the portion which initializes Path with the location of the file to be placed in the distributed cache

    // inside main, surrounded by try catch block, yet no exception thrown here
        Configuration conf = new Configuration();
        // rest of the stuff that relates to conf
        Path knowledgefilepath = new Path(args[3]); // args[3] = /app/hadoop/jobs/nw_single_pred_in/predict/knowledge.txt
        DistributedCache.addCacheFile(knowledgefilepath.toUri(), conf);
        job.setJarByClass(NBprediction.class);
        // rest of job settings 
        job.waitForCompletion(true); // kick off load

This one is inside the map function:

    try {
    System.out.println("Inside try !!");
    Path files[]= DistributedCache.getLocalCacheFiles(context.getConfiguration());
    Path cfile = new Path(files[0].toString()); // only one file
    System.out.println("File path : "+cfile.toString());
    CSVReader reader = new CSVReader(new FileReader(cfile.toString()),'\t');
    while ((nline=reader.readNext())!=null)
    data.put(nline[0],Double.parseDouble(nline[1])); // load into a hashmap
    }
    catch (Exception e)
    {// handle exception }

Help appreciated.

Cheers !

1
It's very hard to figure out what's wrong if you don't share the part of your code when you're using the distributed cache.Charles Menguy
/app/hadoop/jobs/nw_single_pred_in/predict is this the absolute path of the file or the directory where the file reside?shazin
@shazin It is the directory on HDFS where the file resides.stholy
@CharlesMenguy Added code. Please see post.stholy

1 Answers

0
votes

Did a fresh installation of hadoop and ran the job with the same jar, the problem disappeared. Seems to be a bug rather than programming errors.