5
votes

I have a Solr instance with a number of document and an indexed field.

I now want to apply a stopwords list on the query to increase the number of results, by completely ignoring at query time the words included in the stopwords list.

Thus in my configuration I'm using solr.StopFilterFactory in query analyzer.

What I'm expecting is that if I perform a search with only a single word that is in the stopwords list, the result set is the same of a wildcard query, text_title:*, that is the full documents set.

But instead I get 0 results. Am I missing something about the behaviour of the stopwords filter?

1

1 Answers

0
votes

solr.StopFilterFactory

This filter discards, or stops analysis of, tokens that are on the given stop words list. A standard stop words list is included in the Solr config directory, named stopwords.txt, which is appropriate for typical English language text.

https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-StopFilter

This filter actually remove token that are in your query, not replace with *
Example :

In: "To be or what?"
Tokenizer to Filter: "To"(1), "be"(2), "or"(3), "what"(4)
Out: "To"(1), "what"(4)

Try to use this filter.
solr.SuggestStopFilterFactory

Like Stop Filter, this filter discards, or stops analysis of, tokens that are on the given stop words list. Suggest Stop Filter differs from Stop Filter in that it will not remove the last token unless it is followed by a token separator.

You would normally use the ordinary StopFilterFactory in your index analyzer and then SuggestStopFilter in your query analyzer.

This filter will remove stop word from your query if it will not followed by token separator.

How to use:

<analyzer type="query">
  <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  <filter class="solr.LowerCaseFilterFactory"/>
  <filter class="solr.SuggestStopFilterFactory" ignoreCase="true" words="stopwords.txt" format="wordset"/>
</analyzer>

Example :

In: "The The"
Tokenizer to Filter: "the"(1), "the"(2)
Out: "the"(2)