1
votes

i indexed a collection of archived websites for querying using solr. As unique key i use the URL's of the sites. What i would like to do is to use the url field in filter queries to limit the search to a certain domain when needed. For example i want to query for "Barack Obama", but limit the results to the "whitehouse.gov" domain. Sounds like a pretty basic use case to me, however searches on the URL field do not return any results at all. Here is my config (schema.xml):

 .
 .
 .
 <field name="collection" type="string" indexed="true" stored="true"/>
 <field name="content" type="text_de" indexed="true" stored="true" multiValued="true"/>
 <field name="date" type="string" indexed="true" stored="true"/>
 <field name="digest" type="string" indexed="true" stored="true"/>
 <field name="length" type="string" indexed="true" stored="true"/>
 <field name="segment" type="string" indexed="true" stored="true"/>
 <field name="site" type="string" indexed="true" stored="true"/>
 <field name="title" type="text_de" indexed="true" stored="true" multiValued="true"/>
 <field name="type" type="string" indexed="true" stored="true"/>
 <field name="url" type="text_en_splitting" indexed="true" stored="true"/>
 .
 .
 .

<!-- Field to use to determine and enforce document uniqueness. 
  Unless this field is marked with required="false", it will be a required field
-->
 <uniqueKey>url</uniqueKey>

And here is my query (simplified):

http://mysolrserver.com:8983/solr/select/?q=content:Barack+Obama&fq=url:whitehouse.gov

The query analyzer tells me, that my query should match:

screenshot solr analysis

Does anyone have an idea why this is not working? I highly appreciate any hints i can get! Thanks alot guys!!

1

1 Answers

2
votes

The fq=url:whitehouse.gov filtering should work.

However I see the problem with the query q=content:Barack+Obama.
Whats your default search field ??
Does removing the query component and using q=*:* return results for you. ??

q=content:Barack+Obama query would actually result into a query like content:barack defaultsearchfield:obama
As the default search field would not have obama this would not result in any results.