1
votes

What is the difference between index time field boosts (field.setBoost(boost)) and query time boosts (query.setBoost(boost))

Lucene's FAQ seems to conflict with the javadoc. (Lucene 4.9.0)

FAQ:

Index time field boosts (field.setBoost(boost)) are a way to express things like "this document's title is worth twice as much as the title of most documents". Query time boosts (query.setBoost(boost)) are a way to express "I care about matches on this clause of my query twice as much as I do about matches on other clauses of my query".

Index time field boosts are worthless if you set them on every document.

JAVADOC:

Lucene allows influencing search results by "boosting" at different times:

Index-time boost by calling Field.setBoost() before a document is added to the index. Query-time boost by setting a boost on a query clause, calling Query.setBoost(). Indexing time boosts are pre-processed for storage efficiency and written to storage for a field as follows:

From testing, the FAQ is wrong. Setting the same index time field boosts on all documents does affect scoring.

The javadoc sounds like index time field boost and query time boosts have the exact same affect on scoring. Is this true?

1

1 Answers

0
votes

They have (roughly) the same effect, yes. The point the documentation is making is that if you boost everything it will have no meaningful impact on scoring. It says that they will be worthless, not that they will be ignored. It's just like writing a query like this:

field:one^2 field:two^2 field:three^2

Those query time boosts will change the scores, yes, but since every query term is boosted by the same amount, the impact is not meaningful. The distribution of the results is not impacted at all by the boosts, so they serve no practical purpose.

Whether to use query-time or index-time boosts just comes down to what's convenient. If a certain field value should always be boosted, you can use an index-time boost. If you want it to be boosted for a particular query, then query-time boosting is the only thing that makes sense.


Roughly because index-time boosts are stored using a lossy compression algorithm, which sometimes results in a noticeable loss of precision.