0
votes

I have tried to look at Boosting and "Function Score Query", but either not understood how to use them for my purpose, or not found what technique to use in order to achieve my goal.

TL;DR The users informs me how his preferences in regards to different fields/aspects of my products, and I would like elastic search to return to me the products that BEST match his preferences. Is this possible?

I have a class with a lot of fields given as numbers. e.g.:

public class Product
{
   public double? Weight { get; set; }
   public int? Price { get; set; }
   public double? Size { get; set; }
}

A search will be based on a (at runtime decided) series of prioritizations/scores. e.g.

Weight: 0 negative
Price: 5 negative
Size: 8 positive

These score (being normalized between 0 and 10) means that this user doesn't care about the weight of the product, he cares somewhat about the price, and he wants it negatively correlated with the value of the field (e.g. he wants price to be low, but "only" with an importance of 5 out of 10. The most important thing for this user is the size, which is quite important to be "large".

For this example I want to make a search between all of my products, but giving a higher score to products with a large size, and making the price lower being "medium" important, and not caring about the weight.

How might such a query look like?

P.S. Any links to documentation/guides for NEST/elastic search would be appreciated. I haven't found the official documentation that helpful.

EDIT: Let me rephrase: A user informs me how how important different aspects of my products are. e.g. the price, weight and size. To some users a low weight is VERY important (i.e. they score the important of a LOW weight = 10), to others the price is very important, and to some the weight is important. To some none of these are important, and to some some fields on my product is important.

After the user has scored the importance of each aspects of my product, I need to search for a product that best matches the users preferences.

As such if the user thinks the weight and price is the most important, I want elastic products that have a very low weight and price, without caring about the size.

Example: In elastic I have 4 products: (Weight = W, Size = S, Price = P)

P1: W=200, S=40, P=2500
P2: W=50, S=10, P=2000
P3: W=400, S=45, P=4000
P4: W=200, S=45, P=3000

Low weight/Price = good, High Size = good

If a user scores:

Weight=10, Price=0, Size=5

The result should be that it returns top X results, sorted (using the score system in elastic search?) as follow: P2,P4,P1,P3 (because a low price is the most important, followed by big size, with the price being irrelevant)

If a user scores:

Weight=5, Price=3, Size=8

The result should be that it returns top X results, sorted as follow: P4,P3,P1,P2 (because a high/big size is the most important, followed by low weight, with the price being of less importance)

1
Sorry but what are you trying to do here just order items by the field values or is this related to some query you have like size > 5. Could you give an example of 5 products and what order would you expect to get back.Filip Cordas
I have attempted to make a few examples, and clarified my "main" question. Appreciate the feedback.Nixxon

1 Answers

2
votes

First of all I am not really sure you know what you want to do here you definitions use words like good or bad and this are terms too wage to define a program. Here is a simple program that will do something like you are asking

var index = "product";
            var type = "product";

            var db = new ElasticClient(new Uri("http://localhost:9200"));

            await db.DeleteIndexAsync(index);

            //I am using dynamic data but you can use your class it's easear as well
            await db.IndexAsync(new 
            {
                name = "P1", W=200, S=40, P=2500
            }, i=>i.Index(index).Type(type));

            await db.IndexAsync(new 
            {
                name = "P2", W=50, S=10, P=2000
            }, i=>i.Index(index).Type(type));

            await db.IndexAsync(new 
            {
                name = "P3", W=400, S=100, P=1000
            }, i=>i.Index(index).Type(type));

            await db.IndexAsync(new 
            {
                name = "P4", W=200, S=45, P=3000
            }, i=>i.Index(index).Type(type));

            await Task.Delay(1000);

            //I think there needs to be some sort of normalizations on fields this is a max base normalization so we can use 
            var max = await db.SearchAsync<dynamic>(s =>
               s.Size(0)
               .Index(index)
               .Type(type)
               .Aggregations(aggr =>
                   aggr
                   .Min("maxWeight", f => f.Field("w"))
                   .Max("maxPrice", f => f.Field("s"))
                   .Max("maxSize", f => f.Field("p"))));

            // This is to calculate the factors the max value is to normalize multivariable data so all the values be on scale from 0-1
            //The max value will allways be 1 and the othhers will be a precentage of the max value this will only work for none negative values
            // You can use some other way of normalizing but this depends on the data.
            var paramsData1 = new
            {
                Weight = (10 - 5) / max.Aggs.Max("maxWeight").Value,
                Price = 3 / max.Aggs.Max("maxPrice").Value,
                Size = 8 / max.Aggs.Max("maxSize").Value
            };

            // The first query is based on busting the fields based on factors entered
            var items = await db.SearchAsync<dynamic>(s =>
                s.Index(index)
                .Type(type)
                .Query(q => q.FunctionScore(fs =>
                    fs.Functions(ff =>
                        ff.FieldValueFactor(fvf => fvf.Field("w").Factor(paramsData1.Weight))
                        .FieldValueFactor(fvf => fvf.Field("s").Factor(paramsData1.Size))
                        .FieldValueFactor(fvf => fvf.Field("p").Factor(paramsData1.Price)))
                    .BoostMode(FunctionBoostMode.Sum))));

            System.Console.WriteLine("______________________________");
            foreach (var item in items.Hits)
            {
                System.Console.WriteLine($"Name:{item.Source.name};S:{item.Source.s};W:{item.Source.w};P:{item.Source.p};");
            }


            var paramsData2 = new
            {
                //this is to reverse the data since from what I can tell lower is better
                Weight =(10 - 10) / max.Aggs.Max("maxWeight").Value,
                Price = 0 / max.Aggs.Max("maxPrice").Value,
                Size = 5 / max.Aggs.Max("maxSize").Value
            };

            //You can write you own score function and by hand if needed and do some sort of calculation.
            var itemsScript = await db.SearchAsync<dynamic>(s =>
                s.Index(index)
                .Type(type)
                .Query(q => q.FunctionScore(fs => fs.Functions(ff =>
                    ff.ScriptScore(
                    ss =>
                        ss.Script(script => script.Params(p =>
                            p.Add("Weight", paramsData2.Weight)
                            .Add("Price", paramsData2.Price)
                            .Add("Size", paramsData2.Weight))
                            .Inline("params.Weight * doc['w'].value + params.Price * doc['p'].value + params.Size * doc['s'].value")))))));

            System.Console.WriteLine("______________________________");
            foreach (var item in itemsScript.Hits)
            {
                System.Console.WriteLine($"Name:{item.Source.name};S:{item.Source.s};W:{item.Source.w};P:{item.Source.p};");
            }

But this is just a start Factor analysis is a field of study by it self. Here are a few links for scripting and function scoring I hope it helps. https://www.elastic.co/guide/en/elasticsearch/painless/5.5/painless-examples.html https://www.elastic.co/guide/en/elasticsearch/guide/current/script-score.html https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html#scoring-theory https://jontai.me/blog/2013/01/advanced-scoring-in-elasticsearch/(In this one the syntax is out of date but the logic still stands) https://qbox.io/blog/optimizing-search-results-in-elasticsearch-with-scoring-and-boosting