now i use 3.6.1 and nutch 1.5 and it worked fine...i crawl my site and index the data into solr and use solr search, but two weeks ago it's started not work...
Whene i use ./nutch crawl urls -solr http://localhost:8080/solr/
-depth 5 -topN 100 command it's work, but whene i use ./nutch crawl urls -solr http://localhost:8080/solr/
-depth 5 -topN 100000, it's throw an exception, in my log file i found this..
2013-02-05 17:04:20,697 INFO solr.SolrWriter - Indexing 250 documents
2013-02-05 17:04:20,697 INFO solr.SolrWriter - Deleting 0 documents
2013-02-05 17:04:21,275 WARN mapred.LocalJobRunner - job_local_0029
org.apache.solr.common.SolrException: Internal Server Error
Internal Server Error
request: `http://localhost:8080/solr/update?wt=javabin&version=2`
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:124)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:55)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:44)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:457)
at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:497)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:195)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:51)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
2013-02-05 17:04:21,883 ERROR solr.SolrIndexer - java.io.IOException: Job failed!
2013-02-05 17:04:21,887 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at 2013-02-05 17:04:21
2013-02-05 17:04:21,887 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: `http://localhost:8080/solr/`
two weeks ago it works well... Did anybody got similar problem?
Hi, i just finish crawling and haw the same exception, but when i look at my log/hadoop.log file, i found this..
2013-02-06 22:02:14,111 INFO solr.SolrWriter - Indexing 250 documents
2013-02-06 22:02:14,111 INFO solr.SolrWriter - Deleting 0 documents
2013-02-06 22:02:14,902 WARN mapred.LocalJobRunner - job_local_0019
org.apache.solr.common.SolrException: Bad Request
Bad Request
request: `http://localhost:8080/solr/update?wt=javabin&version=2`
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:124)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:55)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:44)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:457)
at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:497)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:304)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
2013-02-06 22:02:15,027 ERROR solr.SolrIndexer - java.io.IOException: Job failed!
2013-02-06 22:02:15,032 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at 2013-02-06 22:02:15
2013-02-06 22:02:15,032 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: `http://localhost:8080/solr/`
2013-02-06 22:02:21,281 WARN mapred.FileOutputCommitter - Output path is null in cleanup
2013-02-06 22:02:22,263 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: finished at 2013-02-06 22:02:22, elapsed: 00:00:07
2013-02-06 22:02:22,263 INFO crawl.Crawl - crawl finished: crawl-20130206205733
I hope it will help to understand the problem...