I am having an issue with solr result and I thought I'd ask for suggestions here.
I have enabled phonetic matching by including <filter class="solr.PhoneticFilterFactory" encoder="RefinedSoundex" inject="true"/>
both at query and index level, also with encoder DoubleMetaphone
as a variation.
The issue here is that solr is returning only phonetically matched result and disregarding wildcard match or almost exact search phrase match.
Example:
In my index, I have a document with a field called 'name' and value 'Modenine', When I search for name:mod
, I get a "Modenine
" which is OK,
But when I search using name:mode
, note the extra 'e'
, it returns 'Something Foul Mouth'
and this is because, mouth
phonetically matches mode
, I don't mind having 'Something Foul Mouth' as a result but I also want to see 'Modenine' since mode
is the actual search term.
The quickest solution that comes to my head is have a way to add the phonetic code to index during indexing, then use dismax to rank the result by providing score using ^2.0 for example.
I have the following: Field declarations
<field name="phoneticName" type="phonetics" indexed="true" stored="true"/>
<field name="name" type="phonetics" indexed="true" stored="true"/>
FieldType for phonetics
<fieldType name="phonetics" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<filter class="solr.LowerCaseFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.PhoneticFilterFactory" encoder="RefinedSoundex" inject="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PhoneticFilterFactory" encoder="RefinedSoundex" inject="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
But After re-indexing, the phoneticName field only has the exact value of the name field, it doesn't store the phonetic code which I aim to searchby.
I found this solr-boosting-down-phonetic-variations but doesn't have much detail.
Thanks P