0
votes

I am running a task on Hadoop2:

$hadoop jar hipi.jar "/5" "/processWOH" 1

hipi.jar: the jar file name

"/5": the input folder name

"/processWOH": the output folder name

I am getting and exception regarding the path /localhost:9000/5/LC814000.tif:

Error: java.io.FileNotFoundException: /localhost:9000/5/LC814000.tif (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at ProcessWithoutHIPI.ProcessRecordReaderWOH.getCurrentKey(ProcessRecordReaderWOH.java:81)
        at ProcessWithoutHIPI.ProcessRecordReaderWOH.getCurrentKey(ProcessRecordReaderWOH.java:1)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getCurrentKey(MapTask.java:507)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.getCurrentKey(MapContextImpl.java:70)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.getCurrentKey(WrappedMapper.java:81)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

I think ( I am not sure) the problem with the extra "/localhost:9000" added to the path, but I don't know how it is added ( By hadoop, java code, ...).

Notice: this jar file is running fine outside of hadoop but in hadoop (hdfs) it is not

Any help is appreciated

Update: As I discovered later that "/5" folder is searched inside the local system not inside hdfs and if I create a folder in the local file system with name "localhost:9000" under root i.e. /localhost:9000 and put "/5" the code will run, but in this case the data is taken outside from hadoop like if I am not using hadoop at all. So is this a mistake in programming i.e. I should use hadoop io packages instead of java io packages to deal with hdfs instead of local filesystem, or it is another problem.?

2
the prefix /localhost:9000 is about the path of hdfs; please execute the following command and past the result: $hadoop fs -ls /localhost:9000/Imi.Cino
@Imi.Cino Thanks. I am now out of office tomorrow morning I ll run it and submit the result.Mosab Shaheen
/localhost:9000 is not floder; 9000 is the port of your hdfs! you can see it in your core-site.xml. please show your core-site.xml and your mapreduce programmeImi.Cino
@Imi.Cino That's true for the correct case, but in my case it is taking all the paths from the local system not the hdfs and I don't know why. And because of that I created a folder /localhost:9000/ in the local system and it worked but now all the data are taken and written outside Hadoop!!Mosab Shaheen
@Imi.Cino <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>Mosab Shaheen

2 Answers

1
votes

The default directory of your hdfs is /localhost:9000/, hadoop can not find your input file there; just past it in /localhost:9000/:

$hadoop fs -put $LOCAL_PATH_OF_INPUT_FILE:/5 /localhost:9000/
$hadoop jar hipi.jar "/5" "/processWOH" 1

Good luck!

0
votes

The problem is as I said earlier was that Java IO (i.e. File Class, Path class,...) treats paths as in local file system whereas Hadoop Io (FileSystem class, Path class,...) treats paths as in HDFS.

Please have a look here: read/write from/in HDFS

Using FileSystem API to read and write data to HDFS

Reading data from and writing data to Hadoop Distributed File System (HDFS) can be done in a lot of ways. Now let us start by using the FileSystem API to create and write to a file in HDFS, followed by an application to read a file from HDFS and write it back to the local file system.

Step 1: Once you have downloaded a test dataset, we can write an application to read a file from the local file system and write the contents to Hadoop Distributed File System.

package com.hadoop.hdfs.writer;

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;

import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.ToolRunner;

public class HdfsWriter extends Configured implements Tool {

    public static final String FS_PARAM_NAME = "fs.defaultFS";

    public int run(String[] args) throws Exception {

        if (args.length < 2) {
            System.err.println("HdfsWriter [local input path] [hdfs output path]");
            return 1;
        }

        String localInputPath = args[0];
        Path outputPath = new Path(args[1]);

        Configuration conf = getConf();
        System.out.println("configured filesystem = " + conf.get(FS_PARAM_NAME));
        FileSystem fs = FileSystem.get(conf);
        if (fs.exists(outputPath)) {
            System.err.println("output path exists");
            return 1;
        }
        OutputStream os = fs.create(outputPath);
        InputStream is = new BufferedInputStream(new FileInputStream(localInputPath));
        IOUtils.copyBytes(is, os, conf);
        return 0;
    }

    public static void main( String[] args ) throws Exception {
        int returnCode = ToolRunner.run(new HdfsWriter(), args);
        System.exit(returnCode);
    }
}

Step 2: Export the Jar file and run the code from terminal to write a sample file to HDFS:

[training@localhost ~]$ hadoop jar HdfsWriter.jar com.hadoop.hdfs.writer.HdfsWriter sample.txt /user/training/HdfsWriter_sample.txt

Step 3: Verify whether the file is written into HDFS and check the contents of the file:

[training@localhost ~]$ hadoop fs -cat /user/training/HdfsWriter_sample.txt

Step 4: Next, we write an application to read the file we just created in Hadoop Distributed File System and write its contents back to the local file system:

package com.hadoop.hdfs.reader;

import java.io.BufferedOutputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class HdfsReader extends Configured implements Tool {

    public static final String FS_PARAM_NAME = "fs.defaultFS";

    public int run(String[] args) throws Exception {

        if (args.length < 2) {
            System.err.println("HdfsReader [hdfs input path] [local output path]");
            return 1;
        }

        Path inputPath = new Path(args[0]);
        String localOutputPath = args[1];
        Configuration conf = getConf();
        System.out.println("configured filesystem = " + conf.get(FS_PARAM_NAME));
        FileSystem fs = FileSystem.get(conf);
        InputStream is = fs.open(inputPath);
        OutputStream os = new BufferedOutputStream(new FileOutputStream(localOutputPath));
        IOUtils.copyBytes(is, os, conf);
        return 0;
    }

    public static void main( String[] args ) throws Exception {
        int returnCode = ToolRunner.run(new HdfsReader(), args);
        System.exit(returnCode);
    }
}

Step 5: Export the Jar file and run the code from terminal to write a sample file to HDFS:

[training@localhost ~]$ hadoop jar HdfsReader.jar com.hadoop.hdfs.reader.HdfsReader /user/training/HdfsWriter_sample.txt /home/training/HdfsReader_sample.txt

Step 6: Verify whether the file is written back into local file system:

[training@localhost ~]$ hadoop fs -cat /user/training/HdfsWriter_sample.txt

FileSystem is an abstract class that represents a generic file system. Most Hadoop file system implementations can be accessed and updated through the FileSystem object. To create an instance of the HDFS, you call the method FileSystem.get(). The FileSystem.get() method will look at the URI assigned to the fs.defaultFS parameter of the Hadoop configuration files on your classpath and choose the correct implementation of the FileSystem class to instantiate. The fs.defaultFS parameter of HDFS has the value hdfs://.

Once an instance of the FileSystem class has been created, the HdfsWriter class calls the create() method to create a file in HDFS. The create() method returns an OutputStream object, which can be manipulated using normal Java I/O methods. Similarly HdfsReader calls the method open() to open a file in HDFS, which returns an InputStream object that can be used to read the contents of the file.

The FileSystem API is extensive. To demonstrate some of the other methods available in the API, we can add some error checking to the HdfsWriter and HdfsReader classes we created.

To check whether the file exists before we call create(), use:

boolean exists = fs.exists(inputPath);

To check whether the path is a file, use:

boolean isFile = fs.isFile(inputPath);

To rename a file that already exits, use:

boolean renamed = fs.rename(inputPath, new Path("old_file.txt"));