I would like to use Lucene's fuzzy search, which I understand is based on some sort of Levenshtein-like algorithm. If I use a fairly high threshold (i.e, "new york~0.9"), will it first compute the edit distance and then see if it is less than whatever 0.9 corresponds to, or will it cut off the algorithm if it becomes apparent that the document does not match the query that closely? I understand that that is possible with the levenshtein algorithm.
3
votes
1 Answers
2
votes
will it cut off the algorithm if it becomes apparent that the document does not match the query that closely?
No. The code you want to see is lines 57-59 of FuzzyTermEnum:
int dist = editDistance(text, target, textlen, targetlen);
distance = 1 - ((double)dist / (double)Math.min(textlen, targetlen));
return (distance > FUZZY_THRESHOLD);
You can see that it calculates the distance, then returns if that is less than the threshold.
Why do you care about this though? Unless your terms are thousands of characters long, calculating the full edit distance will be really quick.