3
votes

Using Solr Apache. I would like to show first, exact match and then fuzzy match in search reault.

For example, I try to find the world test however in the response I get results with worlds like cast, latest before results which contains the actual word test first.

I tried queries like this: test^100 OR test~^5 but that did not change the order of the result, maybe this change on the query is not correct.

My Solr config:

solrconfig.xml
<requestHandler name="/select" class="solr.SearchHandler">
  <lst name="defaults">
    <str name="echoParams">explicit</str>
    <str name="defType">edismax</str>
    <int name="rows">10</int>
    <str name="df">text</str>
    <str name="wt">php</str>
    <str name="sort">score desc</str>
    <str name="fl">*, score</str>

   <!-- Highlighting defaults -->
   <str name="hl">on</str>
   <str name="hl.fl">*</str>
   <str name ="hl.snippets">10</str>
   <str name="hl.fragsize">250</str>
   <str name="tie">0.1</str> 
   <str name="hl.simple.pre">&lt;b style="color:black;background-color:#888888"&gt;</str>
   <str name="hl.simple.post">&lt;/b&gt;</str>
   <str name="hl.usePhraseHighlighter">true</str>
   <str name="hl.highlightMultiTerm">true</str>
   <str name ="hl.useFastVectorHighligher">true</str>
   <str name ="hl.maxAnalyzedChars">200000</str>
   <str name="f.title.hl.alternateField">title</str>
   <str name="f.description.hl.alternateField">description</str>
   <str name="spellcheck">true</str>
 </lst>

schema.xml  
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="50"/>
      <filter class="solr.StandardFilterFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.TrimFilterFactory" />
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      <filter class="solr.StandardFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StandardFilterFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.PorterStemFilterFactory"/>
      <filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
      <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
  </analyzer>
</fieldType>

Thanks,

2

2 Answers

1
votes

I was looking for the solution to the same problem. After going through few documentations and mailer forums, I realized there is no in-built method in solr to achieve this directly. Though this method is not a very clean and efficient one, here is how I have solved this problem:

Create a replica of your query and append '~'(tilde) to it at the end of each word, retain a copy of the query without the tildes and boost it higher. The number of results remains the same and only the exact matches are ranked higher.

For e.g: (bangalore)^20 (bangalore~)10

However this method may not be as efficient as the normal search as each term has two tokens that need to be searched, but if relevancy is a higher priority than efficiency then this can be done. Moreover, an extra term in the query won't slow down the search as much as one would expect.

0
votes

One way to do it is by using "Boosting Ranking Terms", by create a boolean query with the main query part marked as mandatory, and the ranking terms marked as optional with high boost.

Your query should look something like one of the below:

+(basequery) rankingterm1^100
+(basequery) rankingterm1^10000 rankingterm2^100

e.g.:

+(test OR test~) test^100

This way what is between the parentheses is mandatory due to the plus sign + and what is outside is optional with high boost, so if a document match both terms (basequery and rankingterm/s) it will get ranked higher than a document that only matches the mandatory part of the query (basequery)

you can read more about it here: https://cwiki.apache.org/confluence/display/solr/SolrRelevancyCookbook#SolrRelevancyCookbook-BoostingRankingTerms