0
votes

now i use 3.6.1 and nutch 1.5 and it worked fine...i crawl my site and index the data into solr and use solr search, but two weeks ago it's started not work... Whene i use ./nutch crawl urls -solr http://localhost:8080/solr/ -depth 5 -topN 100 command it's work, but whene i use ./nutch crawl urls -solr http://localhost:8080/solr/ -depth 5 -topN 100000, it's throw an exception, in my log file i found this..

2013-02-05 17:04:20,697 INFO  solr.SolrWriter - Indexing 250 documents
2013-02-05 17:04:20,697 INFO  solr.SolrWriter - Deleting 0 documents
2013-02-05 17:04:21,275 WARN  mapred.LocalJobRunner - job_local_0029
org.apache.solr.common.SolrException: Internal Server Error

Internal Server Error

request: `http://localhost:8080/solr/update?wt=javabin&version=2`
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
    at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
    at org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:124)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:55)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:44)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:457)
    at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:497)
    at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:195)
    at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:51)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
2013-02-05 17:04:21,883 ERROR solr.SolrIndexer - java.io.IOException: Job failed!
2013-02-05 17:04:21,887 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at 2013-02-05 17:04:21
2013-02-05 17:04:21,887 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: `http://localhost:8080/solr/`    

two weeks ago it works well... Did anybody got similar problem?

Hi, i just finish crawling and haw the same exception, but when i look at my log/hadoop.log file, i found this..

    2013-02-06 22:02:14,111 INFO  solr.SolrWriter - Indexing 250 documents
2013-02-06 22:02:14,111 INFO  solr.SolrWriter - Deleting 0 documents
2013-02-06 22:02:14,902 WARN  mapred.LocalJobRunner - job_local_0019
org.apache.solr.common.SolrException: Bad Request

Bad Request

request: `http://localhost:8080/solr/update?wt=javabin&version=2`
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
    at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
    at org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:124)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:55)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:44)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:457)
    at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:497)
    at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:304)
    at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
2013-02-06 22:02:15,027 ERROR solr.SolrIndexer - java.io.IOException: Job failed!
2013-02-06 22:02:15,032 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at 2013-02-06 22:02:15
2013-02-06 22:02:15,032 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: `http://localhost:8080/solr/`
2013-02-06 22:02:21,281 WARN  mapred.FileOutputCommitter - Output path is null in cleanup
2013-02-06 22:02:22,263 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: finished at 2013-02-06 22:02:22, elapsed: 00:00:07
2013-02-06 22:02:22,263 INFO  crawl.Crawl - crawl finished: crawl-20130206205733 

I hope it will help to understand the problem...

1
seems in map reduce job failed, you can check the hadoop log for more details.Jayendra
i edit the answer and add the last part of log file...Thanks for answer.Hayk Grigoryan
Solr probably is getting a malformed request. You would get the information in the Solr log which the bad request details and issue.Jayendra

1 Answers

0
votes

With the logs you showed, I think the answer will be on the Solr side. You should have an exception trace there which will tell you what component stopped the processing. And if it worked two weeks ago, either something changed (jar versions?) or you have a particular document that is a problem.

If the problem happens with a single document (try a couple of different ones) than you probably have some environment (jars, properties, etc) change. If it does not happen with one subset of documents but happens with another, there might be an issue with specific document (e.g. wrong encoding).

Again, Solr-side stack trace will be the first thing to check.