performance issue, edit distance for large strings LCP vs Levenshtein vs SIFT

Question

So I'm trying to calculate the distance between two large strings (about 20-100). The obstacle is the performance, I need to run 20k distance comparisons. (It takes hours)

After investigating, I came a cross few algorithms, And I'm having trouble to decide which to choose. (based on performance VS accuracy)

https://github.com/tdebatty/java-string-similarity - performance list for each of the algorithms.

** EDITED **

Is SIFT4 algorithm well-proven / reliable?
Is SIFT4 the right algorithm for the task?
How come it's so much faster than LCP-based / Levenshtein algorithm?
Is SIFT also used in image processing? or is it a different thing? answered by AMH

Thanks.

Alireza Alireza · Accepted Answer · 2017-06-05T18:19:05

As far as i know Scale-invariant feature transform (SIFT) is an algorithm in computer vision detect and describe local features in images.

also if you want to find similar images you must compare local features of images to each other by calculating their distance which may do what you intend to do. but local features are vector of numbers as i remember. it uses Brute-Force matcher:Feature Matching - OpenCV Library - SIFT

please read about SIFT here: http://docs.opencv.org/3.1.0/da/df5/tutorial_py_sift_intro.html

SIFT4 which is mentioned on your provided link is completely different thing.

performance issue, edit distance for large strings LCP vs Levenshtein vs SIFT

1 Answers