0
votes

I have a MapReduce programme that executes correctly locally.

It uses a file called new-positions.csv in the setup() method of the mapper class to populate a hash table in memory:

public void setup(Context context) throws IOException,  InterruptedException {
        newPositions = new Hashtable<String, Integer>();
        File file = new File("new-positions.csv");

        Scanner inputStream = new Scanner(file);
        String line = null;
        String firstline = inputStream.nextLine();
        while(inputStream.hasNext()){
            line = inputStream.nextLine();
            String[] splitLine = line.split(",");
            Integer id = Integer.valueOf(splitLine[0].trim());
            // String firstname = splitLine[1].trim();
            // String surname = splitLine[2].trim();
            String[] emails = new String[4];
            for (int i = 3; i < 7; i++) {
                emails[i-3] = splitLine[i].trim();
            }
            for (String email : emails) {
                if (!email.equals("")) newPositions.put(email, id);
            }
            // String position = splitLine[7].trim();
            inputStream.close();
        }   
    }

The Java programme has been exported to an executable JAR. That JAR and full-positions.csv are both saved in the same directory on our local filesystem.

Then, while inside that directory we execute the following at the terminal (we have also tried it with the full pathname for new-positions.csv):

hadoop jar MR2.jar Reader2 -files new-positions.csv InputDataset OutputFolder

It executes fine, but when it gets to the mapper we get:

Error: java.io.FileNotFoundException: new-positions.csv (No such file or directory)

This file definitely exists locally, and we are definitely executing from within that directory.

We are following the guidance given in Hadoop: The Definitive Guide (4th Ed.), p. 274 onwards, and cannot see how our program and arguments differ in structure.

Could it be something to do with the Hadoop configuration? We know that there are workarounds, such as copying the file to HDFS and then executing from there, but we need to understand why this "-files " argument isn't working as anticipated.

EDIT: Below is some code from the driver class, which may also be the source of the problem:

public int run(String[] args) throws IOException, InterruptedException, ClassNotFoundException { if (args.length != 5) { printUsage(this, " "); return 1; }

     Configuration config = getConf();

     FileSystem fs = FileSystem.get(config);

     Job job = Job.getInstance(config);
     job.setJarByClass(this.getClass());
     FileInputFormat.addInputPath(job, new Path(args[3]));

     // Delete old output if necessary
     Path outPath = new Path(args[4]);
     if (fs.exists(outPath)) 
         fs.delete(outPath, true);

     FileOutputFormat.setOutputPath(job, new Path(args[4]));

     job.setInputFormatClass(SequenceFileInputFormat.class);

     job.setOutputKeyClass(NullWritable.class);
     job.setOutputValueClass(Text.class);

     job.setMapOutputKeyClass(EdgeWritable.class);
     job.setMapOutputValueClass(NullWritable.class);

     job.setMapperClass(MailReaderMapper.class);
     job.setReducerClass(MailReaderReducer.class);

     job.setJar("MR2.jar");


     boolean status = job.waitForCompletion(true);
     return status ? 0 : 1;
 }

 public static void main(String[] args) throws Exception {
     int exitCode = ToolRunner.run(new Reader2(), args);
     System.exit(exitCode);
 }
3

3 Answers

0
votes

Let's assume that your "new-positions.csv" is present in folder: H:/HDP/, then you need to pass this file as:

file:///H:/HDP/new-positions.csv

You need to qualify path with file:///, to indicate that it is a local file system path. Also, you need to pass the fully qualified path.

This works perfectly for me.

For e.g., I pass the local file myini.ini as below:

yarn jar hadoop-mapreduce-examples-2.4.0.2.1.5.0-2060.jar teragen -files "file:///H:/HDP/hadoop-2.4.0.2.1.5.0-2060/share/hadoop/common/myini.ini" -Dmapreduce.job.maps=10 10737418 /usr/teraout/

0
votes

I think Manjunath Ballur gave you a correct answer, but the URI that you passed, file:///home/local/xxx360/FinalProject/new-positions.csv may not be resolvable from the Hadoop worker machine.

That path looks like an absolute path on a machine, but which machine contains home? Add a server to the path and I think it might work.

Alternatively, if you use the singular -file, it looks like Hadoop will copy the file rather than make a symbolic link as it does with -files.

Please see the documentation here.

0
votes

I do not see anything wrong in your code. From my working code which is technically the same with yours, I also got java.io.FileNotFoundException when I add - to file name. Remove - then try again:

        File file = new File("newpositions.csv");
hadoop jar MR2.jar Reader2 -files newpositions.csv InputDataset OutputFolder