What is the difference between Lucene's MoreLikeThis (mlt) and FuzzyQuery (flt)?
I am evaluating both query types through Elasticsearch (ES) and I found they are conceptually very similar:
mlt
: compare an existing documents fields with other documents' fields vsflt
: compare a string with other documents' fields
However, flt
performance seems to be about an order of magnitude slower than the mlt
query.
I'm using the latest ES, which in turn uses Lucene 4.5.
From the fuzzy like this docs:
Fuzzifies ALL terms provided as strings and then picks the best n differentiating terms. In effect this mixes the behaviour of FuzzyQuery and MoreLikeThis but with special consideration of fuzzy scoring factors. This generally produces good results for queries where users may provide details in a number of fields and have no knowledge of boolean query syntax and also want a degree of fuzzy matching and a fast query.
For each source term the fuzzy variants are held in a BooleanQuery with no coord factor (because we are not looking for matches on multiple variants in any one doc). Additionally, a specialized TermQuery is used for variants and does not use that variant term’s IDF because this would favor rarer terms, such as misspellings. Instead, all variants use the same IDF ranking (the one for the source query term) and this is factored into the variant’s boost. If the source query term does not exist in the index the average IDF of the variants is used.