1
votes

I downloaded the examples of latest version for chapter 09 of “Mahout in Action”. I can successfully run several examples, but for three files, NewsKMeansClustering.java, ReutersToSparseVectors.java, and NewsFuzzyKMeansClusteing.java. Running these three programs gives similar error messages:

Aug 3, 2011 2:03:54 PM org.apache.hadoop.metrics.jvm.JvmMetrics init INFO: Initializing JVM Metrics with processName=JobTracker, sessionId=

Aug 3, 2011 2:03:54 PM org.apache.hadoop.mapred.JobClient configureCommandLineOptions WARNING: Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.

Aug 3, 2011 2:03:54 PM org.apache.hadoop.mapred.JobClient configureCommandLineOptions WARNING: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).

Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/user1/workspaceMahout1/recommender/inputDir

at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)

at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)

at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)

at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)

at org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:93) at mia.clustering.ch09.NewsKMeansClustering.main(NewsKMeansClustering.java:54)

For the above messages, I do not quite understand what do those two warnings mean? Moreover, it looks like that “input path” should have been created, how can I create this type of input? Thanks.

2

2 Answers

0
votes

You can ignore the warnings. The error is that the input directory you have specified does not exist. Does it exist? What is your command line?

0
votes

I ran into a similar mismatch. The MiA files at https://github.com/tdunning/MiA have some cases where a .csv file is left in the same dir as the Java source. For example https://github.com/tdunning/MiA/tree/master/src/main/java/mia/recommender/ch02 ... however via Eclipse, loading it using DataModel model = new FileDataModel(new File("intro.csv")); ...doesn't find it.

Adding

System.out.println("CWD: "+System.getProperty("user.dir"));

...will reveal where Eclipse is looking (in my case, a couple levels up the filetree, but this might vary depending on how exactly you've set things up).