How to boost specific documents for a given search term in Elasticsearch?

Question

I need your help on index design for a real scenario. It might be a long question, let me try explain it as concise as possible.

We are building a search platform based on Elasticsearch to provide site search experience for our customers, the document in index could be something like this:

{ "Path":"http://www.foo.com/doc/abc/1", "Title":"Title 1", "Description":"The description of doc 1", ... }
{ "Path":"http://www.foo.com/doc/abc/2", "Title":"Title 2", "Description":"The description of doc 2", ... }
{ "Path":"http://www.foo.com/doc/abc/3", "Title":"Title 3", "Description":"The description of doc 3", ... }
...

For each query, the returned hit documents are by default sorted by relevance, but our customer also wants to boost some specific documents for some keywords,

They give us the following like boosting configuration XML:

<boost>
    <Keywords value="keyword1">
        <Path rank="10000">http://www.foo.com/doc/abc/1</Path>
    </Keywords>

    <Keywords value="keyword2">
        <Path rank="10000">http://www.foo.com/doc/abc/2</Path>
        <Path rank="9900">http://www.foo.com/doc/abc/1</Path>
    </Keywords>

    <Keywords value="keyword3">
        <Path rank="10000">http://www.foo.com/doc/abc/3</Path>
        <Path rank="9900">http://www.foo.com/doc/abc/2</Path>
        <Path rank="9800">http://www.foo.com/doc/abc/1</Path>
    </Keywords>
</boost>

That mean, if user search “keyword1", the top 1 hit document should be the document whose Path field value is "www.foo.com/doc/abc/1", regardless the relevance score of that document. Similarly, if search "keyword3", the top 3 hit documents should be the documents whose Path values are "www.foo.com/doc/abc/3", "www.foo.com/doc/abc/2" and "www.foo.com/doc/abc/1" respectively.

To satisfy this special requirement, my design is, firstly invert the original boosting XML to following format:

<boost>
    <Path value="http://www.foo.com/doc/abc/1">
        <keywords>
           <keyword value="keyword1" rank="10000" />
           <keyword value="keyword2" rank="9900" />
           <keyword value="keyword3" rank="9800" />
        </keywords>
    </Path>

    <Path value="http://www.foo.com/doc/abc/2">
        <keywords>
           <keyword value="keyword2" rank="10000" />
           <keyword value="keyword3" rank=9900" />
        </keywords>
    </Path> 
    <Path value="http://www.foo.com/doc/abc/3">
        <keywords>
           <keyword value="keyword3" rank="10000" />
        </keywords>
    </Path>
</boost>

Then add a nested field "Boost", which contains a array of keyword/rank fields, to the Elasticsearch document as following example:

{
  "Boost": [ 
     { "keyword":"keyword1", "rank": 10000},
     { "keyword":"keyword2", "rank": 9900},
     { "keyword":"keyword3", "rank": 9800}
  ] 
  "Path":"http://www.foo.com/doc/abc/1", 
  "Title":"Title 1", 
  "Description":"The description of doc 1",
   ...
 }

{
    "Boost": [ 
       { "keyword":"keyword2", "rank": 10000},
       { "keyword":"keyword3", "rank": 9900}
    ] 
    "Path":"http://www.foo.com/doc/abc/2", 
    "Title":"Title 2", 
    "Description":"The description of doc 2",
     ...
 }

{

    "Boost": [ 
       { "keyword":"keyword3", "rank": 10000}
    ] 
    "Path":"http://www.foo.com/doc/abc/3", 
    "Title":"Title 3", 
    "Description":"The description of doc 3",
     ...
}

Then in query time, use nested query to get the rank value of each matched document for a given search keyword, and then use the score script to adjust the relevance score by this rank value.

Since the rank value from boosting XML is much larger than normal relevance score ( generally less than 5), the adjusted score of the documents which configured in boosting XML for given keyword should be top scores.

Do you think it is a good design on Elasticsearch? Any suggestions to better approaches?

Thanks in advance!

I am sorry, what do you mean by "to do with elasticsearch"? We are using Elasticsearch to build our search platform. So I am asking how Elasticsearch can do it? — Youxu
I meant because I didn't see that your question is elasticsearch related, so actually you need an opinion based on how to do what you want to do with elasticsearch? — eliasah
Got you. In my original post, I proposed a Elasticsearch solution which use nested object to store an array of keyword/rank for each document. Then, when doing the query, use nested query + scoring script to get the rank value of matched document and adjust the scoring of the document per rank value. With this, the top N document should be the documents in boosting xml file for given keyword. But I am seeking for better solutions if have. — Youxu
I know this is an old question, but I've come across a similar requirement - any solution? — Sagi Mann

astax astax · Accepted Answer · 2015-04-30T12:32:41

It may be better to index the keywords in a separate field with the original documents and then, during search, just boost match in that field.

This is not exactly what you described, as it doesn't give you fine control over boost factor for each keyword. But this is definitely a way to make specific documents appear higher in the search results if the query contains specific keywords.

If you really need to have better control over boost factor for different keywords, you still can do this using this method. But you'll need to create several "boosted keywords" fields and boost them differently in the query.

For example:

{ "Path":"http://www.foo.com/doc/abc/1",
  "Title":"Title 1",
  "Description":"The description of doc 1",
  "boost_kw1": "keyword1 keyword2",
  "boost_kw2": "keyword3 keyword4" },
{ "Path":"http://www.foo.com/doc/abc/1",
  "Title":"Title 1",
  "Description":"The description of doc 1",
  "boost_kw1": "keyword3",
  "boost_kw2": "keyword1 keyword2" }

And in the query you calculate the total score as the sum of:

main query scire
the score of match in "boost_kw1" multiplied by 10
the score of match in "boost_kw2" multiplied by 5

How to boost specific documents for a given search term in Elasticsearch?

1 Answers