So I'm trying to calculate the distance between two large strings (about 20-100). The obstacle is the performance, I need to run 20k distance comparisons. (It takes hours)
After investigating, I came a cross few algorithms, And I'm having trouble to decide which to choose. (based on performance VS accuracy)
https://github.com/tdebatty/java-string-similarity - performance list for each of the algorithms.
** EDITED **
- Is SIFT4 algorithm well-proven / reliable?
- Is SIFT4 the right algorithm for the task?
- How come it's so much faster than LCP-based / Levenshtein algorithm?
- Is SIFT also used in image processing? or is it a different thing? answered by AMH
Thanks.