Hierarchical scoring Lucene, OR term treatment

Question

I was trying to translate an interest profile into some Lucene query.

Given a title term and some expansion term, in JSON format, such as

{"title":"Donald Trump", "Expansion":[["republic","republican"],["democratic","democrat"],["campaign"]]}

The corresponding Lucene query can be a BooleanQuery like following (Set title term boost factor as 3.0 while expansion term boost factor as 1.0).

+(text:donald^3.0 text:trump^3.0 (text:democrat text:democratic) (text:republic text:republican) text:campaign)

Using IndexSearcher's explain() method,

A matching document like,

I know people just want to find a way to be famous without taking any risks, republic republican Donald Trump Campaign.

has a scoring of 9.0

3.0 = weight(text:donald^3.0 in 0) [TitleExpansionSimilarity], result of:
    3.0 = score(doc=0,freq=1.0), product of:
      3.0 = queryWeight, product of:
        3.0 = boost
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = queryNorm
      1.0 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = fieldNorm(doc=0)
  3.0 = weight(text:trump^3.0 in 0) [TitleExpansionSimilarity], result of:
    3.0 = score(doc=0,freq=1.0), product of:
      3.0 = queryWeight, product of:
        3.0 = boost
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = queryNorm
      1.0 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = fieldNorm(doc=0)
  2.0 = sum of:
    1.0 = weight(text:republic in 0) [TitleExpansionSimilarity], result of:
      1.0 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = fieldNorm(doc=0)
    1.0 = weight(text:republican in 0) [TitleExpansionSimilarity], result of:
      1.0 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = fieldNorm(doc=0)
  1.0 = weight(text:campaign in 0) [TitleExpansionSimilarity], result of:
    1.0 = fieldWeight in 0, product of:
      1.0 = tf(freq=1.0), with freq of:
        1.0 = termFreq=1.0
      1.0 = idf(docFreq=201, maxDocs=201)
      1.0 = fieldNorm(doc=0)

Is there any way to rewrite Lucene scoring function, to score the BooleanQuery (text:republic text:republican) aka. cluster ["republic","republican"] as a maximum of either the matching weight of "republic" or the matching weight of "republican"?

1.0 = MAX(instead of sum) of:
    1.0 = weight(text:republic in 0) [TitleExpansionSimilarity], result of:
      1.0 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = fieldNorm(doc=0)
    1.0 = weight(text:republican in 0) [TitleExpansionSimilarity], result of:
      1.0 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = fieldNorm(doc=0)

femtoRgon femtoRgon · Accepted Answer · 2016-06-08T15:40:09

Not via Lucene's QueryParser syntax, but you can use a DisjunctionMaxQuery, instead of a BooleanQuery to combine queries and score with the maximum score of it's subqueries, instead of the sum of the subqueries' scores.

Hierarchical scoring Lucene, OR term treatment

1 Answers