I'm running a Cluster of five Cubieboards, RaspberryPi-like ARM boards with (because of 32bit) Hadoop 1.2.1 installed on them. There is one Name Node and four Slave Nodes.
For my final paper I wanted to install Apache Nutch 1.9 and Solr for big data analysis. I did the setup explained like this: http://wiki.apache.org/nutch/NutchHadoopTutorial#Deploy_Nutch_to_Multiple_Machines
When starting the Jar Job-File for deploying Nutch over the whole cluster there is a Class not found exception, because there is no Crawl class anymore since nutch 1.7: http://wiki.apache.org/nutch/bin/nutch%20crawl even in the source file it is removed alredy.
The following error is shown then:
hadoop jar apache-nutch-1.9.job org.apache.nutch.crawl.Crawl urls -dir crawl -depth 3 -topN 5 Warning: $HADOOP_HOME is deprecated.
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.nutch.crawl.Crawl at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:266)
Other classes I found in the package seem to work, there should be no problem with the environment setting.
Which alternatives do you have to perform a crawl over the whole cluster. Since Nutch version 2.0 there is a Crawler class. But not in 1.9 :(
Any help is very appreciated. Thank you.