3
votes

I asked a similar question a while back but then I had not idea what I was talking about. I am posting this question with further details and to the point queries.

So I have set up hadoop cluster with namenode and 2 datanodes. I am using hadoop 2.9.0. I ran the command hdfs dfs -put "SomeRandomFile" and it seems to be working ok. The only confusion I have here is why does it store my file to /user/hduser/ path? I didn't specify this path anywhere in configurations so how is it building this path on hdfs?

Furthermore I created a small java program to do the same thing. I created a simple eclipse project and wrote following lines:

public static boolean fileWriteHDFS(InputStream input, String fileName) {   
    try {
        System.setProperty("HADOOP_USER_NAME", "hduser");

        //Get Configuration of Hadoop system
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS", "hdfs://localhost:9000");
        //conf.get("fs.defaultFS");     

        //Extract destination path
        URI uri = URI.create(DESTINATION_PATH+fileName);
        Path path = new Path(uri);

        //Destination file in HDFS
        FileSystem fs = FileSystem.get(uri, conf); //.get(conf);

        //Check if the file already exists
        if (fs.exists(path))
        {
            //Write appropriate error to log file and return.
            return false;
        }

        //Create an Output stream to the destination path
        FSDataOutputStream out = fs.create(path);

        //Copy file from input steam to HDFSs
        IOUtils.copyBytes(input, out, 4096, true);

        //Close all the file descriptors
        out.close();
        fs.close();
        //All went perfectly as planned
        return true;    
    } catch (Exception e) {
        //Something went wrong
        System.out.println(e.toString());
        return false;
    }
}

And I added following three hadoop libraries:

/home/hduser/bin/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0.jar /home/hduser/bin/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0-tests.jar /home/hduser/bin/hadoop-2.9.0/share/hadoop/common/hadoop-nfs-2.9.0.jar

As you can see my hadoop installation location is /home/hduser/bin/hadoop-2.9.0/... When I run this code it throws an exception. i.e.

Exception in thread "main" java.lang.NoClassDefFoundError: com/ctc/wstx/io/InputBootstrapper
at com.ws.filewrite.fileWrite.fileWriteHDFS(fileWrite.java:21)
at com.ws.main.listenerService.main(listenerService.java:21)
Caused by: java.lang.ClassNotFoundException: com.ctc.wstx.io.InputBootstrapper
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 2 more

Specifically the exception is thrown at the lie:

Configuration conf = new Configuration();

Am I missing something here? What is causing this problem? I am completely new to HDFS so pardon me it is obvious problem.

1
First, you don't need -tests, secondly, there are other libraries those JARs depend on. Suggestion: Use Maven, not manually adding JAR files to your classpath - OneCricketeer
"why does it store my file to /user/hduser/ path" ... Because that is the default behavior of the CLI dfs -put command - OneCricketeer
@cricket_007 Thanks mate. I am not familiar with Maven can you suggest some links that can enhance my knowledge in this regard? - usamazf
Yes, We have faced same issue in Hadoop 3.2.1 I used below link code for uploading sample javadeveloperzone.com/hadoop/java-read-write-files-hdfs-example - BASS KARAN

1 Answers

2
votes

hadoop 2.9 dependencies not similar with hadoop 2.6.

i had encountered the same situation, and try to find the dependency jar. that's difficult, and may another jar miss in the next time...

so, i use Maven to manager dependencies.

you just append this two dependency, problem will be solved.

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.9.0</version>
        <!--<scope>provided</scope>-->
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>2.9.0</version>
    </dependency>