0
votes

I can't seem to prevent the Solr spellcheck component from delimiting words by characters. This is the field which is the basis of my spelling suggestions:

<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>

And this is my main general use field type:

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" enablePositionIncrements="true" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" enablePositionIncrements="true" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

When I run the query:

skiny jen\"as

I am returned the collated spelling suggestion of:

"skinny jeans\\\"wash"

This seems odd because the query has been broken into skiny, jen and as but collated together in this format. When I use the Solr analyzer to see what's being done to my query, these are the end results I get for the two field types (which give me what I would expect):

text_en: skini | jen\"a

textSpell: skiny | jen\"as

So with this in mind, why are jen and as being processed seperately in the token jen\"as?

1

1 Answers

0
votes

The answer is to specify spellcheck.q and q together. That way the main query results are based on q but the spelling suggestion on spellcheck.q. It looks like Solr was tokenizing and applying some filters to q.