2
votes

I am currently testing facet searches on a text field in my Solr schema and noticing that I am getting a significant number of results that are in my stopwords.txt file.

My schema is currently using the default configuration for the text data type, and I was under the impression that stopwords were not indexed if the "solr.StopFilterFactory" filter was in use.

I am hoping that someone can shed some light on this and either a) help me understand why stopwords don't apply to facets and how to live with it, or b) point me in the right direction so my facet queries don't return words from stopwords.

Thanks!

1
you shouldn't facet on tokenized fieldsMauricio Scheffer
Hi, i have the same problem/situation. My "keywords" are partially merged into one filed - Example "car/dog/red/be/at" next filed: "blue/green/yellow" etc. So I have to use the tokenizner to brake the string into words: <tokenizer class="solr.PatternTokenizerFactory" pattern="/" />. Additionally i use a stopword-list to remove stopwords (like: at, be,...) But the stopwords are also indexed & stored an will be returned at an facetet search ... using Sol 1.3 Is there any workaround for that?The Bndr

1 Answers

1
votes

Stopwords do apply to facets. In other words: if you ask for a facet of a field that has been indexed with stopwords you should not see any stopwords in the facet.

My guess is that you are not indexing the way you think: either your schema.xml is wrong or you are indexing in a different field than you think.

I am using facets on this field and works well:

<fieldType name="text_ws_stop" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.StopFilterFactory"
    ignoreCase="true"
            words="stopwords_spanish.txt"
            enablePositionIncrements="true"
    />
  </analyzer>
</fieldType>

...

<field name="phrases" type="text_ws_stop" indexed="true" stored="true" required="false"/>