I am new to Nutch and Solr. So, I apologize in advance if I am asking basic question.
Details of environment:
- Virtual Box with Guest OS: Ubuntu 12.04.4, Host OS: Windows 8
- Nutch Release: Apache nutch 1.7
- Solr Release: Apache Solr 3.6.2
- Referring to wiki.apache.org/nutch/NutchTutorial
I initiated crawling with command-
bin/nutch crawl urls -solr http://<code>mylocalhost<code>:8983/solr/ -depth 3 -topN 5
This command succeeded with no errors.
After that, I opened the solr admin page in browser and tried to search with a default query string: \*:*
. However, this resulted in a page with the below content:
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="start">0</str>
<str name="q">*:*</str>
<str name="rows">10</str>
<str name="indent">on</str>
<str name="version">2.2</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
</response>
When I tried to search for 'nutch' in solr, it resulted in an error: "HTTP Error 400".
Could you please help me see data crawled by nutch so that I can validate it.