5
votes

Current Situation

I am using the percolate feature of elasticsearch. It works all well - I get the matching percolate-ids back for a new document and can build basically an inverse search. Up until now all great.

Problem

Here comes the problem: I want to have a score expressing how well the given document matches the query of a percolator (exactly the score a normal query gives me). To do this I added the track_scores, but got no luck.

I found this in the documentation for track_scores:

...The score is based on the query and represents how the query matched to the percolate query’s metadata and not how the document being percolated matched to the query...

Is what I want/need even possible?

Example showing the problem

Here a sample demonstrating the problem (taken from elasticsearch.org). Here the score returned in the percolate-response is always 1.0, regardless of the input document:

//Index the percolator
curl -XPUT 'localhost:9200/my-index/.percolator/1' -d '{
    "query" : {
        "match" : {
            "message" : "bonsai tree"
        }
    }
}'

Percolate first document:

curl -XGET 'localhost:9200/my-index/message/_percolate' -d '{
    "doc" : {
        "message" : "A new bonsai tree in the office"
    },
    "track_scores" : "true"
}'


//...returns
{"took": 1, "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
}, "total": 1, "matches": [
    {
        "_index": "my-index",
        "_id": "1",
        "_score": 1.0 <-- Score
    }
]}

Percolate a second (different) one:

//Percolate a second one
curl -XGET 'localhost:9200/my-index/message/_percolate' -d '{
    "doc" : {
        "message" : "A new bonsai tree in the office next to another bonsai tree is cool!"
    },
     "track_scores" : "true"
}'


//...returns
{"took": 3, "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
}, "total": 1, "matches": [
    {
        "_index": "my-index",
        "_id": "1",
        "_score": 1.0 <-- SAME Score, but different document (other score needed here!)
    }
]}

What would I need

I want to have a score of something like 0.8 for the first document and something like 0.9 for the second one. But they can not have the same score like they did here. How can I achieve what I want?

Thanks a lot for any idea and help.

2
Does anyone know if Elastic plans to support this functionality in future versions? I don't understand the technical road block to adding this functionality.JimSTAT

2 Answers

3
votes

Score is relative to other documents in the data set. You could potentially do some sort of custom scoring where you only focus on term frequency/inverse document frequency of the document on hand, but probably won't be terribly effective, but might be good enough.

I am not not sure if this is a viable solution for your problem, but one approach would be re-run all matching percolate queries against the whole dataset and grab your docs score from a that and re-index the document with that data. Since it is all relative, this would potentially require you to then update all the other documents matching the query. Likely, it would be best to do the global re-score at some set interval.

-1
votes

Your document does not define a query to limit the search space. The _score is calculated on this query, not the queries you percolate against.