Elasticsearch allows for searching of similar documents via its "more-like-this" (MLT) query. I'm trying to better understand and tune the query to find similar documents better.
While experimenting with it, I've found that the result from a single MLT query with multiple fields yields different results from a boolean of multiple MLT queries with one field each. Samples below (truncated):
Single MLT query with multiple fields
es.search(index=INDEX_NAME, body = {'query': {
"more_like_this" : {
"fields" : ['title', 'category_name', 'brand'],
"like" : []
}
}
})
Multiple MLT queries with single field
es.search(index=INDEX_NAME, body = {'query': {
'bool': {
'should': [
{'more_like_this' : {
'fields' : ['title'],
'like' : [],
}},
{'more_like_this' : {
'fields' : ['category_name'],
'like' : [],
}},
{'more_like_this' : {
'fields' : ['brand'],
'like' : [],
}},
]
}
}
})
Why does this happen?
I understand that the MLT query would combine the text from all the fields listed in a single query to search through the documents. However, there is no overlap of text in the title, category_name, and brand field. Thus, the results should be the same. However, the results are different--the multiple MLT queries works better btw.
I apologise if this question has no straight forward solution. I'm looking for greater understanding from elastic gurus on how to improve returned queries.
If you have time, here's a previous question I posted on MLT which remains unanswered: Elasticsearch "more_like_this" query specific to fields