Solr: Scoring exact matches higher than partial matches

Question

In a very simple case, I have three documents with filenames "Lark", "Larker", and "Larking" (no file extension). In solr, I index these three documents mapping the filename to a "title" field. When I do a search for "Lark" all three documents are returned (which is what I want) but they are all given the same score. I would prefer that "Lark" be scored the highest, as it is an exact match to my query, with the others coming behind.

<field name="title" type="text_general" indexed="true" stored="true" multiValued="false"/>

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

I believe the reason they are getting the same score is because of the EdgeNGramFilterFactory employed at index time. Each document gets indexed as "La", "Lar", "Lark" with two of the documents ("Larker" and "Larking") being indexed with some additional variations. So in effect each document is an exact match for the query "Lark." I would like some way of executing a query where the term "Lark" would return all three documents but with the document titled "Lark" being returned higher than the others.

Results of query debug:

<lst name="debug">
  <str name="rawquerystring">Lark</str>
  <str name="querystring">Lark</str>
  <str name="parsedquery">text:lark</str>
  <str name="parsedquery_toString">text:lark</str>
  <lst name="explain">
    <str name="543d6ee4cbb33c26bbcf288b/xxnullxx/543d6ef9cbb33c26bbcf2892">
2.7104912 = (MATCH) weight(text:lark in 0) [DefaultSimilarity], result of:
  2.7104912 = fieldWeight in 0, product of:
    1.4142135 = tf(freq=2.0), with freq of:
      2.0 = termFreq=2.0
    3.8332133 = idf(docFreq=3, maxDocs=68)
    0.5 = fieldNorm(doc=0)
</str>
    <str name="543d6ee4cbb33c26bbcf288b/xxnullxx/543d6ef9cbb33c26bbcf2893">
2.7104912 = (MATCH) weight(text:lark in 1) [DefaultSimilarity], result of:
  2.7104912 = fieldWeight in 1, product of:
    1.4142135 = tf(freq=2.0), with freq of:
      2.0 = termFreq=2.0
    3.8332133 = idf(docFreq=3, maxDocs=68)
    0.5 = fieldNorm(doc=1)
</str>
    <str name="543d6ee4cbb33c26bbcf288b/xxnullxx/543d6ef9cbb33c26bbcf2894">
2.7104912 = (MATCH) weight(text:lark in 2) [DefaultSimilarity], result of:
  2.7104912 = fieldWeight in 2, product of:
    1.4142135 = tf(freq=2.0), with freq of:
      2.0 = termFreq=2.0
    3.8332133 = idf(docFreq=3, maxDocs=68)
    0.5 = fieldNorm(doc=2)
</str>

It's strange you are getting the same score for all 3 docs. fieldNorm should be lowest for Larking and highest for Lark, so Lark should be getting the highest score. Can you rerun your query with debugQuery=on&wt=xml and check what fieldNorm you are getting for each doc? — arun

Yann Yann · Accepted Answer · 2014-10-15T11:34:34

To boost the exact matches, you could create a new field, called "exact_title", with a new type "text_exact" that doesn't have the EdgeNGramFilterFactory.

In your schema you can use the line:

<copyField source="title" dest="exact_title"/>

to copy title to exact_title.

Then run your query against both fields, title and exact_title. If the query matches an exact title, the document with that exact title will receive a higher score than other documents, and will rise to the top.

Solr: Scoring exact matches higher than partial matches

2 Answers