0
votes

I have been trying Mahout clustering example . I have written a sample program to convert text documents into sequence file and sequence file to vectors in java . I am getting following exception while running the program with all directories required created and copying proper data in it .

Blockquote 14/06/26 08:45:35 ERROR security.UserGroupInformation: PriviledgedActionException as:shshaikh cause:java.io.FileNotFoundException: File file:/home/shshaikh/ClusterWorkDir/sequence/vector/data does not exist. java.io.FileNotFoundException: File file:/home/shshaikh/ClusterWorkDir/sequence/vector/data does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:402) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:255) at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:63) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071) at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapreduce.Job.submit(Job.java:550) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580) at org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:93) at org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:257) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.hello.mahout.MyZFuzzyKmeans.vectorize(MyZFuzzyKmeans.java:100) at com.hello.mahout.MyZFuzzyKmeans.main(MyZFuzzyKmeans.java:55) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293) at java.lang.Thread.run(Thread.java:679)

Blockquote

i have created vector dir and the program delets the dir and it fails with exception FileNotFound .

Can some please help resolving the issue .

Thanks :)

1

1 Answers

0
votes

Mahout is running on distributed file system instead of local file system on default. So when you run mahout command, hadoop will try to find the files on HDFS instead of on local. Thus, what you should do is as follows:

  1. cd to the mahout/bin directory
  2. vi mahout
  3. find the line "#MAHOUT_LOCAL=true;" and change it to "MAHOUT_LOCAL=true;"
  4. source mahout

Then it should work on local now.