3
votes

I'm a bit confused by the Lucene.NET API, but, it may just be a misunderstanding on my part as I'm still learning.

When you create a document, you add fields to that document. An example:

//Create the field.
field = new Field(
    fieldName,
    fieldValue,
    isFieldStorable ? Field.Store.YES : Field.Store.NO,
    Field.Index.ANALYZED
);

//If a boost value was supplied, then set the boost for this field.
if (boostValue != null) {
    field.SetBoost((float)boostValue);
}

This correctly sets the boost on the field. The field is then added to a document, and the document is added to the index writer.

But, it doesn't look like setting the boost on the field really matters. How does that ever make a difference? Because, when I create my query, I need to call something like:

multiFieldQueryParser = new MultiFieldQueryParser(
    Lucene.Net.Util.Version.LUCENE_29,
    fieldsToSearch.ToArray(),
    analyzer
);

Creating an instance of MultiFieldQueryParser allows me to supply a dictionary of boosts, but, then what's the point of setting the boost on the field? The query parser doesn't know anything about my documents and the fields contained within them (and as a result, doesn't know anything about my field boosting).

Is this just a mistake of perhaps old code being left in the library? Or can setting the boost on the field actually make a difference if you have your code structured differently?

2

2 Answers

2
votes

Lucene allows for Indexing time boost (at document and field level) and Query time boost.

When you create a document and added the boost at the field level, you are using the Indexing time boost.

Boost arguments to MultiFieldQUeryParser are for query time boosting. You need not pass this boost value, if you want to use the index time boost values. That boost value is used implicitly in score calculation.

2
votes

I use a the same logic then Ek0nomik and get a constant score for the expanded queries.

I use the SetMultiTermRewriteMethod(CORING_BOOLEAN_QUERY_REWRITE) in a MultiFieldParser and search in the index for two fields. The first field has boost-value=2 and the second field the default-value=1.
After indexing a document I can see the correct norm-values in the index (with luke). While searching the index with a search term in different fields I get the same relevance for all results.
I think there should be a higher relevance on the fields where the boost-value is 2. Do I have a mistake in thinking for the correct scoring result?

In debug mode I can´t see any syntax for scoring the results after parsing my search term. It looks like this: ...description:*test* title:*test* path:*test*....
Shouldn´t there any scoring-values like ...description:*test* title:*test*^2 path:*test*... in the search query?