I built and ran nutch 1.7 from command line just fine
hadoop jar apache-ntuch-1.7.job org.apache.nutch.crawl.Crawl hdfs://myserver/nutch/urls -dir hdfs://myserver/nutch/crawl -depth 5 -topN100
but when I ran the same thing from oozie, it keeps getting Wrong FS: hdfs://myserver/nutch/crawl/crawldb/current, expected: file:///
I checked into the source, every time the code does
FileSystem fs = new JobClient(job).getFs();
the fs gets changed back to local fs.
I override all the instance of these statements, the job then dies in the fetch stage, simply says java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:838)
it really appears that running from oozie causes the wrong version of JobClient class (from hadoop-core.jar) to be loaded.
Anyone saw this before?