0
votes

i have problem while running nutch for inject following is the command i am running

bin/nutch inject bin/crawl/crawldb bin/urls

after running above command, gets following error

Injector: starting at 2014-04-02 13:02:29
Injector: crawlDb: bin/crawl/crawldb
Injector: urlDir: bin/urls/seed.txt
Injector: Converting injected urls to crawl db entries.
Injector: total number of urls rejected by filters: 2
Injector: total number of urls injected after normalization and filtering: 0
Injector: Merging injected urls into crawl db.
Injector: overwrite: false
Injector: update: false
Injector: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:294)
    at org.apache.nutch.crawl.Injector.run(Injector.java:316)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Injector.main(Injector.java:306)

I am running nutch for the first time. i have checked solr, nutch are installed properly.

below details are from log file

java.io.IOException: The temporary job-output directory file:/usr/share/apache-nutch-1.8/bin/crawl/crawldb/1639805438/_temporary doesn't exist!
    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
    at org.apache.hadoop.mapred.MapFileOutputFormat.getRecordWriter(MapFileOutputFormat.java:46)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:449)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:491)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
2014-04-02 12:54:46,251 ERROR crawl.Injector - Injector: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:294)
    at org.apache.nutch.crawl.Injector.run(Injector.java:316)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Injector.main(Injector.java:306)
2
According to your logs you have problems with permission. Probably this job doesn't have permission to create folder inside /usr/...Mysterion
@Mysterion Thank you for response..as u suggested i have change the permissions..but still getting the same error.Lussi
Solved the above error.Lussi
But nutch is not fetching urls from seed file..can anyone help?Lussi
how you solve it? Plz update the questionMysterion

2 Answers

0
votes

was using bin/nutch inject bin/crawl/crawldb bin/urls command to inject

instead of bin/nutch inject crawl/crawldb bin/urls

Which solves the error.

and for fetching urls i have done changes to regex-urlfilter.txt file, now am able to fetch the urls.

0
votes

Make sure you don't have any syntax errors in any of your nutch config files.