6
votes

We are using Solr 3.5 with schema that has the following field declaration:

<fieldType name="fieldN" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory"
            generateWordParts="0" generateNumberParts="0" catenateWords="0" catenateNumbers="0" 
            catenateAll="0" splitOnCaseChange="1" splitOnNumerics="0" preserveOriginal="1"/>
    <filter class="solr.LengthFilterFactory" min="2" max="256"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"
            />
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LengthFilterFactory" min="2" max="256"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"
            />
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

When we send a query like this:

field1:"term1"

Solr returns results.

When we run this query we still get results:

field1:"term1" AND (field2:term2 OR field3:term2)

While term2 is a stop word and term1 is a regular word.

But when we send a query like this:

field1:"term1" AND (field2:term2 OR field3:term2 OR field4:term2)

Nothing returns.

We also noticed that when we do something like:

(field1:"term1" AND (field2:term2 OR field3:term2)) OR (field1:"term1" AND field4:term2)

works too, but as the real query should search for one term in about 200 fields, this option is less preferred.

Thanks.

1
And what is your expected result? Since term2 is a stopword, shouldn't you expect no results for both the 2nd and 3rd queries? Anyway, the first step you should take is to inspect your index with Luke, just to be sure what exactly you are querying against.Marko Topolnik
I was expecting to that the stop word part would not affect the results, query like term1 term2 should return all the documents that match term1 when term2 is a stop word. I will try the tests, thanks.Noam
Yes, analyzing a stopword produces no tokens so the entire query term should be as if not there. But I think this presents a challenge to the QueryParser. Your 2nd query is a BooleanQuery of two clauses where the right clause is an inner BooleanQuery. This inner query turns out to have no clauses if term2 is a stopword, so Lucene is left with an empty BooleanQuery. I wonder how it handles that. There was (a long time ago, but still) a JIRA issue about exactly this case.Marko Topolnik

1 Answers

1
votes

I am guessing that your 'wierdness' has more to do with your solrconfig rules rather than your query with stopwords. I have experienced similar issues with stopword queries inside subqueries and it ended up being my Minimum Match rules in my Dismax search handler.

Look inside your solrconfig.xml and look for the requestHandler your search is using. You should have a "mm" (Minimum Match) string declared. Try adjusting your rules so they are less or more restrictive , whatever your goal is.

Best of luck!