0
votes

Hadoop newbie here. So I just configured a single-node-setup and I'm not sure where the files should be placed?! my understanding is that it should be on the HDFS. So I added a text file 'zulu.txt' to my HDFS with eclipse using "upload file to DFS" (right click on the DFS; see image below)

When I use

String input = "/user/irobot-pc/irobot/In/";

I get the following error code: Input path does not exist

Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/irobot-pc/irobot/In
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
at com.testdrive.WordCount.main(WordCount.java:37)

but when indicating the input on my local file system - it works!

String input = "C:\\Users\\iRobot\\Desktop\\Pop";

Ha?! this doesn't make sense to me. I was under the impression that the hadoop file system should be read, no?

Question: I'd like to move to a 'Cluster Setup' and I know that the files must be located under the HDFS. How can I fix it locally before moving up?

enter image description here

1

1 Answers

2
votes

You need to tell your code that you intend to use HDFS and not the local FS. And you can do that with the help of a Configuration object. Add these lines in your job :

Configuration conf = new Configuration();
conf.addResource(new Path("$HADOOP_HOME/conf/core-site.xml"));
conf.addResource(new Path("$HADOOP_HOME/conf/hdfs-site.xml"));

Otherwise your code will look for the input inside your local FS.

HTH