5
votes

I have sets of 3 identical (in Text) items in Azure Search varying on Price and Points. Cheaper products with higher points are boosted higher. (Price is boosted more then Points, and is boosted inversely).

However, I keep seeing search results similar to this.

Search is on ‘john milton’.

I get

Product="Id = 2-462109171829-1, Price=116.57, Points=  7, Name=Life of Schamyl / John Milton Mackie, Description=.", Score=32.499783
Product="Id = 2-462109171829-2, Price=116.40, Points=  9, Name=Life of Schamyl / John Milton Mackie, Description=.", Score=32.454872
Product="Id = 2-462109171829-3, Price=115.64, Points=  9, Name=Life of Schamyl / John Milton Mackie, Description=.", Score=32.316270

I expect the scoring order to be something like this, with the lowest price first.

Product="Id = 2-462109171829-3, Price=115.64, Points=  9, Name=Life of Schamyl / John Milton Mackie, Description=.", Score=
Product="Id = 2-462109171829-2, Price=116.40, Points=  9, Name=Life of Schamyl / John Milton Mackie, Description=.", Score=
Product="Id = 2-462109171829-1, Price=116.57, Points=  7, Name=Life of Schamyl / John Milton Mackie, Description=.", Score=

What am I missing or are minor scoring variations acceptable?

The index is defined as

let ProductDataIndex = 

        let fields = 
                    [|
                        new Field (
                            "id", 
                            DataType.String,
                            IsKey           = true, 
                            IsSearchable    = true);


                        new Field (
                            "culture", 
                            DataType.String,
                            IsSearchable    = true);

                        new Field (
                            "gran", 
                            DataType.String,
                            IsSearchable    = true);

                        new Field (
                            "name", 
                            DataType.String,
                            IsSearchable    = true);

                        new Field (
                            "description", 
                            DataType.String, 
                            IsSearchable    = true);

                        new Field (
                            "price", 
                            DataType.Double, 
                            IsSortable      = true,
                            IsFilterable    = true)

                        new Field (
                            "points", 
                            DataType.Int32, 
                            IsSortable      = true,
                            IsFilterable    = true)
                    |]

        let weightsText = 
            new TextWeights(
                Weights =   ([|  
                                ("name",        4.); 
                                ("description", 2.) 
                            |]
                            |> dict))

        let priceBoost = 
            new MagnitudeScoringFunction(
                new MagnitudeScoringParameters(
                    BoostingRangeStart  = 1000.0,
                    BoostingRangeEnd    = 0.0,
                    ShouldBoostBeyondRangeByConstant = true),
                "price",
                10.0)

        let pointsBoost = 
            new MagnitudeScoringFunction(
                new MagnitudeScoringParameters(
                    BoostingRangeStart  = 0.0,
                    BoostingRangeEnd   = 10000000.0,
                    ShouldBoostBeyondRangeByConstant = true),
                "points",
                2.0)

        let scoringProfileMain = 
            new ScoringProfile (
                            "main", 
                            TextWeights =
                                weightsText,
                            Functions = 
                                new List<ScoringFunction>(
                                        [
                                            priceBoost      :> ScoringFunction
                                            pointsBoost     :> ScoringFunction
                                        ]),
                            FunctionAggregation = 
                                ScoringFunctionAggregation.Sum)

        new Index 
            (Name               =   ProductIndexName
            ,Fields             =   fields 
            ,ScoringProfiles    =   new List<ScoringProfile>(
                                        [
                                            scoringProfileMain
                                        ]))
1
Hi Hocho, quick clarifying question, how many documents are in your index? Scoring in indexes with low document count may be a little off. This is a result of how they are internally organized to enable efficient scale ups and scale downs of your distributed service.Yahnoosh
30+ million documents. I am doing some proof of concept testing, so each document is replicated 3 time with all identical fields except for the Identifying field and the Price and Points fields randomly generated within 10% of each other respectively.hocho
Thanks! Do you see the same behavior when you issue a query that's less selective? For example : "John" (assuming you have more than one John in your dataset :))Yahnoosh
Yes, I see the same behavior on all queries. Most results show up in the expected order but about 5 to 10% show up in the unexpected order.hocho
Thanks. I'll need more information to answer this. I'll follow up over email and then summarize my findings here once we find the root cause.Yahnoosh

1 Answers

7
votes

All indexes in Azure Search are split into multiple shards allowing us for quick scale up and scale downs. When a search request is issued, it’s issued against each of the shards independently. The result sets from each of the shards are then merged and ordered by score (if no other ordering is defined). It is important to know that the scoring function weights query term frequency in each document against its frequency in all documents, in the shard!

It means that in your scenario, in which you have three instances of every document, even with scoring profiles disabled, if one of those documents lands on a different shard than the other two, its score will be slightly different. The more data in your index, the smaller the differences will be (more even term distribution). It’s not possible to assume on which shard any given document will be placed.

In general, document score is not the best attribute for ordering documents. It should only give you general sense of document relevance against other documents in the results set. In your scenario, it would be possible to order the results by price and/or points if you marked price and/or points fields as sortable. You can find more information how to use $orderby query parameter here: https://msdn.microsoft.com/en-us/library/azure/dn798927.aspx