0
votes

I'm using Elasticsearch v 1.7.3

Here are my fields in the document:

       Field1, Field2, Field3, Field4

I need to give the weightage to each field say Field1: 40, Field2: 40, Field3: 10, Field4: 10

During indexing, Field1 and Field2 are expanded to their phonetic tokens. So we have Field1 ==> Field1, Field1.1, Field1.2 and Field2 => Field2, Field2.1, Field2.2

My query can be based on a combination of any of the above 4 fields.

Now for scoring, I don't want to use TF/IDF or BM25 scoring models.

Rather I simply want to compute the weighted average per field and sum them together.

For example for input query:

Field1: ABC
Field2: PQR
Field3: XYZ
Field4: RST

Assume there are the following documents in the corpus:

Document 1
-----------
Field1: ABC
Field2: PQR
Field3: XYZ
Field4: RST

Document 2
-----------
Field1: ABX
Field2: PQR
Field3: XYZ
Field4: RST

Score for Document 1: 100 ==> WeightedAverage(Field1) + WeightedAverage(Field2) + WeightedAverage(Field3) + WeightedAverage(Field4) ===> 40 + 40 + 10 + 10

Score for Document 2: 90 ==> WeightedAverage(Field1) + WeightedAverage(Field2) + WeightedAverage(Field3) + WeightedAverage(Field4) ===> 30 + 40 + 10 + 10 (Not exactly but I hope you get the idea).

Can I do this in function_score query? I couldn't quite get how this can be accomplished. Thanks.

1
What's the language/framework you're using ?Aysennoussi

1 Answers

1
votes

You need to take a look at function score query.Inside function Score , define a boolean query with filters on each of the field and assgin boost(40 or 10) and choose boost_mode as sum.

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

{
    "functions": [
        {
            "filter": {
                "query": {
                    "bool": {
                        "should": [
                            {
                                "match": {
                                    "inputloc1": "ABC"
                                }
                            }
                        ]
                    }
                }
            },
            "boost_factor": 11
        },
        {
            "filter": {
                "query": {
                    "bool": {
                        "should": [
                            {
                                "query_string": {
                                    "fields": [
                                        "input"
                                    ],
                                    "query": "xyz",
                                    "fuzziness": 0,
                                    "fuzzy_prefix_length": 0
                                }
                            }
                        ]
                    }
                }
            },
            "boost_factor": 6
        }
    ],
    "boost_mode": "sum"
}

i gave the example for function from my code, but you can switch all the query to match(instead of query strings).What you define inside the function computes only the score.What you define inside the query(inside the function_score actually filter the documents).

Hope this helps.