6
votes

At index time I am boosting certain document in this way:

if (myCondition)  
{
   document.SetBoost(1.2f);
}

But at search time documents with all the exact same qualities but some passing and some failing myCondition all end up having the same score.

And here is the search code:

BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.Add(new TermQuery(new Term(FieldNames.HAS_PHOTO, "y")), BooleanClause.Occur.MUST);
booleanQuery.Add(new TermQuery(new Term(FieldNames.AUTHOR_TYPE, AuthorTypes.BLOGGER)), BooleanClause.Occur.MUST_NOT);
indexSearcher.Search(booleanQuery, 10);

Can you tell me what I need to do to get the documents that were boosted to get a higher score?

Many Thanks!

2
I would recommend to post a minimal code showing I index that way, search that way and this doesn't work (working code without leaving anything to reader's imaginations).L.B
I'm certainly not an expert, but you should try loading your index into Luke and checking the boost values for each document. Also try checking the Explain() method result to see what type of query is being executed. My guess is it's a ConstantScoreQuery. From the Lucene docs: "A query that wraps another query or a filter and simply returns a constant score equal to the query boost for every document that matches the filter or query. For queries it therefore simply strips of all scores and returns a constant one." I'm not sure how to prevent this, but it seems like this query ignores scoring.goalie7960
@goalie7960 can you post a link to the page you are referencing from the docs. That ma be the clue I need.Barka
So you don't want share any code?L.B
This is for Lucene 3.1. So maybe it won't help. There is also conflicting docs for this class if you search for it. I was also able to replicate your issue with a simple program. lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/search/…goalie7960

2 Answers

6
votes

Lucene encodes boosts on a single byte (although a float is generally encoded on four bytes) using the SmallFloat#floatToByte315 method. As a consequence, there can be a big loss in precision when converting back the byte to a float.

In your case SmallFloat.byte315ToFloat(SmallFloat.floatToByte315(1.2f)) returns 1f because 1f and 1.2f are too close to each other. Try using a bigger boost so that your documents get different scores. (For exemple 1.25, SmallFloat.byte315ToFloat(SmallFloat.floatToByte315(1.25f)) gives 1.25f.)

2
votes

Here is the requested test program that was too long to post in a comment.

class Program
{
    static void Main(string[] args)
    {
        RAMDirectory dir = new RAMDirectory();
        IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer());

        const string FIELD = "name";

        for (int i = 0; i < 10; i++)
        {
            StringBuilder notes = new StringBuilder();
            notes.AppendLine("This is a note 123 - " + i);

            string text = notes.ToString();

            Document doc = new Document();
            var field = new Field(FIELD, text, Field.Store.YES, Field.Index.NOT_ANALYZED);

            if (i % 2 == 0)
            {
                field.SetBoost(1.5f);
                doc.SetBoost(1.5f);
            }
            else 
            {
                field.SetBoost(0.1f);
                doc.SetBoost(0.1f);
            }

            doc.Add(field);
            writer.AddDocument(doc);
        }

        writer.Commit();

        //string TERM = QueryParser.Escape("*+*");
        string TERM = "T";

        IndexSearcher searcher = new IndexSearcher(dir);
        Query query = new PrefixQuery(new Term(FIELD, TERM));
        var hits = searcher.Search(query);            
        int count = hits.Length();

        Console.WriteLine("Hits - {0}", count);

        for (int i = 0; i < count; i++)
        {
            var doc = hits.Doc(i);
            Console.WriteLine(doc.ToString());

            var explain = searcher.Explain(query, i);
            Console.WriteLine(explain.ToString());
        }
    }
}