3
votes

Elasticsearch's "more_like_this" query allows users to find similar documents based on a search document ID.

I have a query to find documents that are similar to a searched document on specific fields (i.e., title, brand, category_name).

es.search(index=INDEX_NAME, body = {'query': {
"more_like_this" : {
    "fields" : ['title', 'brand', 'category_name'],
    "like" : [
    {
        "_index" : INDEX_NAME,
        "_type" : TYPE_NAME,
        "_id" : "8117769"
    }
    ],
    "min_term_freq" : 2,
    "max_query_terms" : 25
    }
}
})

I had the impression that it would match the searched document's title field with other documents title field, brand with brand, and category name with category. However, the results seems to suggest otherwise. Instead, it seems to combine text from the searched documents title, brand, and category field, and then search from them.

Is there a way to limit the more like this query to match field with field, instead of combine fields and match on all fields?

Additional understanding on more like this behaviour from: Elasticsearch "More Like This" API vs. more_like_this query

The more like this api goes one step further, allowing to provide the id of a document and, again, a list of fields. The content of those fields will be extracted from that specific document and used to make a more like this query on the same fields. That means that the generated more like this query will have the property text containing the text previously extracted and will be performed on the same fields. As you can see the more like this api executes a more like this query under the hood.