Django Haystack Solr autocomplete with numbers not working

Question

I almost have autocomplete working using Haystack with Solr, but it doesn't seem to work when the tag I'm trying to match starts with only one number.

I have these tags:

"8th Grade"
"9th Grade"
"10th Grade"

This is my query and Haystack definition:

tags = SearchQuerySet().models(Tag).filter(SQ(name_auto=autocomplete_string))

class TagIndex(indexes.SearchIndex, indexes.Indexable):
    name = indexes.CharField(model_attr='name', faceted=True)
    name_auto = indexes.EdgeNgramField(model_attr='name')

autocomplete_string = "10" works.
autocomplete_string = "th" works.
autocomplete_string = "8th" does NOT work.

This is part of my Schema for Solr:

<fieldType name="edge_ngram" class="solr.TextField" positionIncrementGap="1">
    <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front" />
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    </analyzer>
</fieldType>

Looks like it is somehow splitting "9th Grade" into numbers and words. It gets only a "9" of lenght 1, so it isn't able to perform the query. I wonder how can I force to index "9th" as an atomic word and not have issues when autocompleting by "9t" or adjust the settings to get it working.

For some reason, I wouldn't want to decrease minGramSize to 1, but if that's the only way ..

fasouto fasouto · Accepted Answer · 2014-05-05T15:49:17

Please check out http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory you maybe want to put splitOnNumerics to 0

splitOnNumerics="1" causes alphabet => number transitions to generate a new part [Solr 1.3]:
    "j2se" => "j" "2" "se"
    default is true ("1"); set to 0 to turn off

(not a SOLR expert, I'm not 100% sure of this)

Django Haystack Solr autocomplete with numbers not working

1 Answers