nutch 1.7 keeps change filesystem to local when run from oozie

Question

I built and ran nutch 1.7 from command line just fine

hadoop jar apache-ntuch-1.7.job org.apache.nutch.crawl.Crawl hdfs://myserver/nutch/urls -dir hdfs://myserver/nutch/crawl -depth 5 -topN100

but when I ran the same thing from oozie, it keeps getting Wrong FS: hdfs://myserver/nutch/crawl/crawldb/current, expected: file:///

I checked into the source, every time the code does

FileSystem fs = new JobClient(job).getFs();

the fs gets changed back to local fs.

I override all the instance of these statements, the job then dies in the fetch stage, simply says java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:838)

it really appears that running from oozie causes the wrong version of JobClient class (from hadoop-core.jar) to be loaded.

Anyone saw this before?

bhomass bhomass · Accepted Answer · 2014-12-23T00:56:19

it seems the oozie conf directory is missing the proper *-site.xml files. I added mapred-site.xml to /etc/oozie/conf/hadoop-conf directory, and this problem went away.

nutch 1.7 keeps change filesystem to local when run from oozie

1 Answers