2
votes

I'm using Elasticsearch for this project but a Solr solution might be appropriate too. In the query I'd like to include a portion of a should clause that will return results even if none of the other terms can. This will be used for document popularity. I'll periodically calculate reading popularity and add a float field to each doc with a numeric value.

The idea is to return docs based on terms but when that fails, return popular docs ranked by popularity. These should be ordered by term match scores or magnitude of popularity score.

I realize that I could quantize the popularity and treat it like a tag "hottest", "hotter", "hot"... but would like to use numeric field since the ranking is well defined.

Here is the current form of my data (from fetch by id):

GET /index/docs/ipad

returns a sample object

{
   "_index": "index",
   "_type": "docs",
   "_id": "doc1",
   "_version": 1,
   "found": true,
   "_source": {
      "category": ["tablets", "electronics"],
      "text": ["buy", "an",  "ipad"],
      "popularity": 0.95347457,
      "id": "doc1"
   }
}

Current query format

POST /index/docs/_search
{
   "size": 10,
   "query": {
      "bool": {
         "should": [
            {"terms": {"text": ["ipad"]}}
         ],
         "must": [
            {"terms": {"category": ["electronics"]}}
         ]
      }
   }
}

This may seem an odd query format but these are structured objects, not free form text.

Can I add popularity to this query so that it returns items ranked by popularity magnitude along with those returned by the should terms? I'd boost the actual terms above the popularity so they'd be favored.

Note I do not want to boost by popularity, I want to return popular if the rest of the query returns nothing.

2
would sort by score followed by popularity not work ?keety
Looking for something that will return hits even if nothing else does. So imagine the query above hitting nothing--this is the case I want popularity to fallback to, so every query returns something, if only the most popular docs.pferrel

2 Answers

1
votes

One approach I can think of is wrapping match_all filter in constant score and using sort on score followed by popularity

example:

    {
   "size": 10,
   "query": {
      "bool": {
         "should": [
            {
               "terms": {
                  "text": [
                     "ipad"
                  ]
               }
            },
            {
               "constant_score": {
                  "filter": {
                     "match_all": {}
                  },
                  "boost": 0
               }
            }
         ],
         "must": [
            {
               "terms": {
                  "category": [
                     "electronics"
                  ]
               }
            }
         ],
         "minimum_should_match": 1
      }
   },
   "sort": [
      {
         "_score": {
            "order": "desc"
         }
      },
      {
         "popularity": {
            "unmapped_type": "double"
         }
      }
   ]
}
1
votes

You want to look into the function score query and a decay function for this.

Here's a gentle intro: https://www.found.no/foundation/function-scoring/