I'm trying to make a spellchecker in Solr and I'm having an issue with case. The problem is changing the case of the query doesn't affect the number of results returned, but it changes the spellchecker results. For example, if I type 'leave' then I get 7 document results and no spellchecker results. But if I search 'Leave' then I still get 7 document results but now spellcheck has these results:
"spellcheck":{
"suggestions":[
"Leave",{
"numFound":3,
"startOffset":0,
"endOffset":5,
"origFreq":0,
"suggestion":[{
"word":"leave",
"freq":7},
{
"word":"lease",
"freq":4},
{
"word":"travel",
"freq":2}]}],
"correctlySpelled":true,
"collations":[
"collation",{
"collationQuery":"leave",
"hits":7,
"misspellingsAndCorrections":[
"Leave","leave"]}]}
Suggesting lower case 'leave'. Notice it still says 'correctlySpelled' is true. Here's the fields and field types from my schema.xml:
<field name="title" type="text_en" indexed="true" stored="true" multiValued="false" />
<field name="filename" type="string" indexed="true" stored="true" multiValued="false" />
<field name="filext" type="string" indexed="true" stored="true" multiValued="false" />
<field name="version" type="int" indexed="false" stored="true" multiValued="false" />
<field name="docSet" type="string" indexed="true" stored="true" multiValued="false" />
<field name="businessArea" type="string" indexed="true" stored="true" multiValued="false" />
<field name="processGroup" type="string" indexed="true" stored="true" multiValued="false" />
<field name="applicability" type="string" indexed="true" stored="true" multiValued="true" />
<field name="content" type="text_en" indexed="true" stored="true" multiValued="false" />
<field name="lastIndex" type="int" indexed="true" stored="true" multiValued="false" />
<field name="popularity" type="int" indexed="true" stored="true" multiValued="false" default="1"/>
<field name="speller" type="speller_type" indexed="true" stored="true" multiValued="true" />
<copyField source="*" dest="speller"/>
<fieldType name="speller_type" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords_en.txt"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords_en.txt"/>
</analyzer>
</fieldType>
And this is the spellchecking parts of my solrconfig.xml:
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
...
<!--****************************************************************
* Spellcheck configuration
*****************************************************************-->
<str name="spellcheck">on</str>
<!-- Suggestions -->
<str name="spellcheck.count">10</str>
<!-- <str name="spellcheck.maxResultsForSuggest">10</str> -->
<str name="spellcheck.extendedResults">true</str>
<!-- Collations -->
<str name="spellcheck.collate">true</str>
<str name="spellcheck.maxCollationTries">5</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.collateMaxCollectDocs">0</str>
...
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="classname">solr.IndexBasedSpellChecker</str>
<str name="spellcheckIndexDir">./spellchecker</str>
<str name="field">speller</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
If I'm applying a lower case filter to the speller field then why would changing the case while searching change the results from the spellchecker? I've looked for solutions for this but can't find anything that has fixed it.
Thanks for any help.
EDIT: I get the same problem with stopwords, they're not being applied. Even though 'for' is a stopword in stopwords.txt and I'm applying to the speller fieldType, if I type 'leave for application' it suggests 'leave form application' as a collation query. Why aren't the stop words being removed?