0
votes

I am pretty new with nutch so bear with me. I have been encountering an IOException during one of my test crawls. I am using nutch 1.6 with hadoop 0.20.2 (chose this version for windows compatibiliy in setting file access rights).

I am running nutch through eclipse. I followed this guide in importing nutch from an SVN: http://wiki.apache.org/nutch/RunNutchInEclipse

My crawler's code is from this website: http://cmusphinx.sourceforge.net/2012/06/building-a-java-application-with-apache-nutch-and-solr/

Here is the system exception log:

solrUrl is not set, indexing will be skipped...
crawl started in: crawl
rootUrlDir = urls
threads = 1
depth = 1
solrUrl=null
topN = 1
Injector: starting at 2013-03-31 23:51:11
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.

java.io.IOException: Job failed! 
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) 
    at org.apache.nutch.crawl.Injector.inject(Injector.java:
    at org.apache.nutch.crawl.Crawl.run(Crawl.java:
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:
    at rjpb.sp.crawler.CrawlerTest.main(CrawlerTest.java:51)

I see these calls involving paths before #Injector.inject() in Crawl.java

Path crawlDb = new Path(dir + "/crawldb"); 
Path linkDb = new Path(dir + "/linkdb"); 
Path segments = new Path(dir + "/segments"); 
Path indexes = new Path(dir + "/indexes"); 
Path index = new Path(dir + "/index");

Currently I my eclipse project does not include the folders crawldb,linkdb,segments... I think my problem is I have not set all the necessary files for crawling. I have only set nutch-site.xml,regex-urlfilter.txt, and urls/seed.txt. Any advice on the matter will be of great help. Thanks!

1

1 Answers

0
votes

I didn't have much success when I tried running nutch 1.6 on Windows. I downloaded the latest version known to run in Windows (nutch 1.2) and didn't have any problems with that.