0
votes

When I try

hadoop jar apache-nutch-2.2.jar org.apache.nutch.crawl.Crawler crawl -dir crawl -depth 3 -topN 5

I am getting the following error...

13/07/09 09:02:46 WARN conf.Configuration: nutch-default.xml:a attempt to override final parameter: hadoop.job.history.user.location;  Ignoring.
13/07/09 09:02:46 WARN conf.Configuration: nutch-default.xml:a attempt to override final parameter: hadoop.job.history.user.location;  Ignoring.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/gora/persistency/impl/PersistentBase
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
    at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218)
    at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
    at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
    at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.ClassNotFoundException: org.apache.gora.persistency.impl.PersistentBase
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    ... 22 more

Can somebody help me to get the error fixed?

1

1 Answers

1
votes

You have problems with gora dependence. Surely apache-nutch-2.2.jar does not have org/apache/gora/persistency/impl/PersistentBase.class. You can check with:

jar tf apache-nutch-2.2.jar | grep PersistentBase

Check that you compile Nutch with Gora 0.3 version.

I guess you don't have gora-* dependences installed in your hadoop nodes, so a solution is to send them using the .job (instead .jar) witch has all dependences bundled for Hadoop.

If you have this installation:

~
|- nutch/
      |- apache-nutch-2.2.job
      |- bin/
           |- nutch

and PATH=~/nutch/bin:.....

You can execute Nutch just with:

$ nutch inject ...

$ nutch crawl

and nutch command calls Hadoop when needed.

============ updated ==============

Offending line: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.nutch/nutch/2.2.1/org/apache/nutch/crawl/InjectorJob.java/#218

============ update 2 =============

You are invoking Nutch with the command line:

hadoop jar nutch...jar

If you do that, you must assure that gora-core-0.x.jar is in the classpath.

If you invoke a .job, it must have lib/gora-core-0.x.jar inside the zip. Hadoop unpacks that .job and adds lib/* to the classpath, so should not be necessary to do anything.