I am pretty new with nutch so bear with me. I have been encountering an IOException during one of my test crawls. I am using nutch 1.6 with hadoop 0.20.2 (chose this version for windows compatibiliy in setting file access rights).
I am running nutch through eclipse. I followed this guide in importing nutch from an SVN: http://wiki.apache.org/nutch/RunNutchInEclipse
My crawler's code is from this website: http://cmusphinx.sourceforge.net/2012/06/building-a-java-application-with-apache-nutch-and-solr/
Here is the system exception log:
solrUrl is not set, indexing will be skipped...
crawl started in: crawl
rootUrlDir = urls
threads = 1
depth = 1
solrUrl=null
topN = 1
Injector: starting at 2013-03-31 23:51:11
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at org.apache.nutch.crawl.Injector.inject(Injector.java:
at org.apache.nutch.crawl.Crawl.run(Crawl.java:
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:
at rjpb.sp.crawler.CrawlerTest.main(CrawlerTest.java:51)
I see these calls involving paths before #Injector.inject() in Crawl.java
Path crawlDb = new Path(dir + "/crawldb");
Path linkDb = new Path(dir + "/linkdb");
Path segments = new Path(dir + "/segments");
Path indexes = new Path(dir + "/indexes");
Path index = new Path(dir + "/index");
Currently I my eclipse project does not include the folders crawldb,linkdb,segments... I think my problem is I have not set all the necessary files for crawling. I have only set nutch-site.xml,regex-urlfilter.txt, and urls/seed.txt. Any advice on the matter will be of great help. Thanks!