Solr facets ignore stopwords at query time

Question

I am using Solr 4.6.0 and I am trying to get the most frequent terms grouped by year. Since it is possible that my stopwords can change often, I do not apply the stopwords at indexing time. Instead, all dynamic word-lists like stopwords, protwords and synonyms are used at query time. But although the stopword-list includes terms like "of" and "the", they are still displayed in the result-list (see Results).

Question: How can I get facetted and stopword-filtered results, if I use the StopFilterFactory only at query time?

Additional information

If I use the StopFilterFactory at indexing time, everything is as expected. The terms like "of" and "the" are filtered out, when I run my query.

I also have tested the functionality of the fieldtype text_en with the Solr admin analysis tool and the results are as expected - "of" and "the" are filtered out. That means that somehow the SearchHandler does not call the right analyzer?

Query

http://ip:port/solr/collection1/select?q=*:*&rows=0&facet=true&facet.pivot=year,text

Results

[..]
<lst name="facet_pivot">
  <arr name="year,text">
    <lst>
      <str name="field">year</str>
      <int name="value">2009</int>
      <int name="count">139</int>
      <arr name="pivot">
        <lst>
          <str name="field">text</str>
          <str name="value">of</str>
          <int name="count">135</int>
        </lst>
        <lst>
          <str name="field">text</str>
          <str name="value">the</str>
          <int name="count">135</int>
        </lst>
        <lst>
          <str name="field">text</str>
          <str name="value">and</str>
          <int name="count">123</int>
[..]

Schema.xml

<field name="year" type="int" indexed="true" stored="true" />
    <field name="text" type="text_en" indexed="true" stored="true" multiValued="true" />
    [..]
    <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
          <analyzer type="index">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPossessiveFilterFactory"/>
            <filter class="solr.PorterStemFilterFactory"/>
          </analyzer>
          <analyzer type="query">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" />
            <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPossessiveFilterFactory"/>
            <filter class="solr.PorterStemFilterFactory"/>
          </analyzer>
        </fieldType>

Could you expound a little on why your stop words change frequently? I'm wondering if a different approach is needed here. — Mark Leighton Fisher

Paige Cook Paige Cook · Accepted Answer · 2014-01-13T15:11:57

Please see the thread - does solr support query time only stopwords? from Solr Mailing List.

This sounds very similar to your requirements and their workaround was to enable the stopFilterFactory at index time, however without a stopwords file specified to get it working as expected.

Solr facets ignore stopwords at query time

4 Answers