0
votes

Given two strings what is an efficient algorithm to find the number and length of longest common sub-strings with the sub-strings being called common if :
1) they have at-least x% characters same and at same position.
2) the start and end indexes of the sub-strings being same.

Ex :
String 1 -> abedefkhj
String 2 -> kbfdfjhlo

suppose the x% being asked is 40,then, ans is,

5 1
where 5 is the longest length and 1 is the number of sub-strings in each string satisfying the given property. Sub-String is "abede" in string 1 and "kbfdf" in string 2.

1
Brute-forcing this (finding all sub-strings and comparing their % of matching characters) is your last resort I guess?Sash
Yes, I have asked it here in hope that it isn't the last and only resort.user3098992

1 Answers

1
votes

You can use smth like Levenshtein distance without deleting and inserting.

Build the table, where every element [i, j] is error for substring from position [i] to position [j].

foo(string a, string b, int x):
    len = min(a.length, b.length)
    error[0][0] = 0 if a[0] == b[0] else 1;
    for (end: [1 -> len-1]):
        for (start: [end -> 0]):
            if a[end] == b[end]:
                error[start][end] = error[start][end - 1]
            else:
                error[start][end] = error[start][end - 1] + 1
    best_len = 0;
    best_pos = 0;
    for (i: [0 -> len-1]):
        for (j: [i -> 0]):
            len = i - j + 1
            error_percent = 100 * error[i][j] / len
            if (error_percent <= x and len > best_len):
                best_len = len
                best_pos = j
   return (best_len, best_pos)