I'm new to HDInsight. I want to learn and practice machine learning, HDInsight is just what i want, but there seems no direct API to mahout. Since mahout recommendation will translate to mapredure job essentially, so I followed some mapreduce example on Windows Azure documentation and write the following codeļ¼
// Define the MapReduce job
MapReduceJobCreateParameters mrJobDefinition = new MapReduceJobCreateParameters()
{
JarFile = "wasb:///example/jars/mahout-core-0.9-job.jar",
ClassName = "org.apache.mahout.cf.taste.hadoop.item.RecommenderJob",
};
mrJobDefinition.Arguments.Add(" -s SIMILARITY_COOCCURRENCE");
mrJobDefinition.Arguments.Add(" --input=/reply");
mrJobDefinition.Arguments.Add(" --output=/recommend/");
mrJobDefinition.Arguments.Add(" --usersFile=/data/users.txt");
I have already upload the "mahout-core-0.9-job.jar" to /example/jars in the specified Azure blob storage container.
But I received the following error message:
14/04/03 12:04:28 ERROR security.UserGroupInformation: PriviledgedActionException as:johnny cause:java.io.IOException: Exception reading file:/c:/apps/temp/hdfs/mapred/local/taskTracker/johnny/jobcache/job_201404031203_0001/jobToken= java.security.PrivilegedActionException: java.io.IOException: Exception reading file:/c:/apps/temp/hdfs/mapred/local/taskTracker/johnny/jobcache/job_201404031203_0001/jobToken= at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1233) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:951) at org.apache.hadoop.mapreduce.Job.submit(Job.java:550) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580) at org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob.run(PreparePreferenceMatrixJob.java:77) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:164) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:322) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) Caused by: java.io.IOException: Exception reading file:/c:/apps/temp/hdfs/mapred/local/taskTracker/johnny/jobcache/job_201404031203_0001/jobToken= at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:136) at org.apache.hadoop.mapred.JobClient.readTokensFromFiles(JobClient.java:2149) at org.apache.hadoop.mapred.JobClient.populateTokenCache(JobClient.java:2185) at org.apache.hadoop.mapred.JobClient.access$300(JobClient.java:179) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:964) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951) ... 16 more Caused by: java.io.FileNotFoundException: File file:/c:/apps/temp/hdfs/mapred/local/taskTracker/johnny/jobcache/job_201404031203_0001/jobToken= does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:427) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:254) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:125) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:436) at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:130) ... 21 more Exception in thread "main" java.io.IOException: Exception reading file:/c:/apps/temp/hdfs/mapred/local/taskTracker/johnny/jobcache/job_201404031203_0001/jobToken= at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:136) at org.apache.hadoop.mapred.JobClient.readTokensFromFiles(JobClient.java:2149) at org.apache.hadoop.mapred.JobClient.populateTokenCache(JobClient.java:2185) at org.apache.hadoop.mapred.JobClient.access$300(JobClient.java:179) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:964) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1233) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:951) at org.apache.hadoop.mapreduce.Job.submit(Job.java:550) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580) at org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob.run(PreparePreferenceMatrixJob.java:77) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:164) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:322) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) Caused by: java.io.FileNotFoundException: File file:/c:/apps/temp/hdfs/mapred/local/taskTracker/johnny/jobcache/job_201404031203_0001/jobToken= does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:427) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:254) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:125) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:436) at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:130) ... 21 more Shutting down watcher/keep alive thread pool forcefully templeton: job failed with exit code 1
After I Googled on the internet, it seems some change should be made to mapred-site.xml or other hadoop config files.But I'm totally new to Apache hadoop and doesn't have much knowledge about Linux and Java.
Any help or direction would be much appreciate.