I changed some of my fields from text_general
to text_en
, hoping to take advantage of stemming and some other improvements, but unfortunately the change has broken highlighting. It seems that it only wants to highlight non-stemmed words (i.e. words whose stemmed version is the same as the word itself, like "child").
I'm using the default fieldType definition:
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
And enabling highlighting with hl.fl=title&hl=true
in my query. This is also a faceted search, if that matters.
In this case, as I said, only unstemmed words like "child" are highlighted. If I remove the stemming filter from the index analyzer (only, the query analyzer seems to have no effect) in the text_en
definition, all matched words except stopwords are highlighted. Furthermore, if I change text_en
to use the EnglishMinimalStemFilterFactory
, more words are highlighted, which I assume is because they are stemmed by the Porter stemmer but not by this one. An example of such a word is "strides".
Does anyone know what's going on?