1
votes

Command: ./crawl /urls /mydir XXXXX 2

When I run this command in Hadoop-2.5.1 and Nutch-2.2.1, I get the wrong information as following.

14/10/07 19:58:10 INFO mapreduce.Job: Running job: job_1411692996443_0016
14/10/07 19:58:17 INFO mapreduce.Job: Job job_1411692996443_0016 running in uber mode : false 14/10/07 19:58:17 INFO mapreduce.Job: map 0% reduce 0%
14/10/07 19:58:21 INFO mapreduce.Job: Task Id : attempt_1411692996443_0016_m_000000_0, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
14/10/07 19:58:26 INFO mapreduce.Job: Task Id : attempt_1411692996443_0016_m_000000_1, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected 14/10/07 19:58:31 INFO mapreduce.Job: Task Id : attempt_1411692996443_0016_m_000000_2, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected 14/10/07 19:58:36 INFO mapreduce.Job: map 100% reduce 0% 14/10/07 19:58:36 INFO mapreduce.Job: Job job_1411692996443_0016 failed with state FAILED due to: Task failed task_1411692996443_0016_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
14/10/07 19:58:36 INFO mapreduce.Job: Counters: 12

Job Counters 
    Failed map tasks=4
    Launched map tasks=4
    Other local map tasks=3
    Data-local map tasks=1
    Total time spent by all maps in occupied slots (ms)=11785
    Total time spent by all reduces in occupied slots (ms)=0
    Total time spent by all map tasks (ms)=11785
    Total vcore-seconds taken by all map tasks=11785
    Total megabyte-seconds taken by all map tasks=12067840
Map-Reduce Framework
    CPU time spent (ms)=0
    Physical memory (bytes) snapshot=0
    Virtual memory (bytes) snapshot=0

14/10/07 19:58:36 ERROR crawl.InjectorJob: InjectorJob: java.lang.RuntimeException: job failed: name=[/mydir]inject /urls, jobid=job_1411692996443_0016

at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:55)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:233)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:251)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:273)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:282)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
2
So, what is the question? - Prophet
How the crawling will be done? Are there some good suggestions about the configuration of Nutch-2.2.1 on Haoop-2.5.1 cluster. A very big thanks advance. - emailfeifan
It's pretty obvious what the the question is. How do you fix the error message. - Dr.Knowitall

2 Answers

0
votes

Probably you are using Gora (or smth else) compiled with Hadoop 1 (from maven repo?). You can download Gora (0.5?) and build it with Hadoop 2.

Perhaps it is just the first trouble in the series of problems. Please notify us about your future steps.

0
votes

I had similar error on nutch 2.x with hadoop 2.4.0

Recompile nutch with hadoop 2.5.1 dependencies (ivy) and exclude all hadoop 1.x dependencies - you can find them in lib - probably hadoop-core.