Is there any efficient algorithm that counts the length of Longest Common Palindromic Subsequence of two given strings?
for example:
string 1. afbcdfca
string 2. bcadfcgyfka
the LCPS is 5 and LCPS string is afcfa
.
Is there any efficient algorithm that counts the length of Longest Common Palindromic Subsequence of two given strings?
for example:
string 1. afbcdfca
string 2. bcadfcgyfka
the LCPS is 5 and LCPS string is afcfa
.
Yes.
Just use an algorithm for LCS for more than two sequences.
If I'm not mistaken, then
LCS( afbcdfca, acfdcbfa, bcadfcgyfka, akfygcfdacb) = afcfa
It's up to you to figure out strings #2 and #4.
Update: No, here is a counterexample: LCS( aba, aba, bab, bab ) = ab, ba. The LCS doesn't ensure the subsequence will be a palindrome, you probably need to add this constraint. Anyway, the recurrence equation for LCS is a good starting point.
If you implement LCS in a generator style, so that it generates all LCS of length n, then n-1, n-2 etc., then you should be able to fairly efficiently compute the longest common member in LCS-gen(string1, reverse-string1), LCS-gen(string2, reverse-string2). But I havn't checked yet, if there is a highly efficient LCS-gen.
Here's my foolproof walkthrough line by line since this is quite common and most of the time people explain the dynamic programming part 70% and stop at the gory details.
1) Optimal Substructure:
Let X[0..n-1]
be the input sequence of length n and L(0, n-1)
be the length of the longest palindromic subsequence of X[0..n-1].
If last and first characters of X are same, then L(0, n-1) = L(1, n-2) + 2
. wait why, what if the 2nd and the 2nd to last characters are not
the same, wouldn't the last and first bing the same be useless. nope this "subsequence" not have to be continuous.
/* Driver program to test above functions */
int main()
{
char seq[] = "panamamanap";
int n = strlen(seq);
printf ("The lnegth of the LPS is %d", lps(seq, 0, n-1));
getchar();
return 0;
}
int lps(char *seq, int i, int j)
{
// Base Case 1: If there is only 1 character
if (i == j)
return 1;
// Base Case 2: If there are only 2 characters and both are same
if (seq[i] == seq[j] && i + 1 == j)
return 2;
// If the first and last characters match
if (seq[i] == seq[j])
return lps (seq, i+1, j-1) + 2;
// If the first and last characters do not match
else return max( lps(seq, i, j-1), lps(seq, i+1, j) );
}
Considering the above implementation, following is a partial recursion tree for a sequence of length 6 with all different characters.
L(0, 5)
/ \
/ \
L(1,5) L(0,4)
/ \ / \
/ \ / \
L(2,5) L(1,4) L(1,4) L(0,3)
In the above partial recursion tree, L(1, 4)
is being solved twice. If we draw the complete recursion tree, then we can see that there are
many subproblems which are solved again and again. Like other typical Dynamic Programming(DP) problems, recomputations of same subproblems
can be avoided by constructing a temporary array L[][]
in bottom up manner.
// Returns the length of the longest palindromic subsequence in seq
int lps(char *str)
{
int n = strlen(str);
int i, j, cl;
int L[n][n]; // Create a table to store results of subproblems
// Strings of length 1 are palindrome of length 1
for (i = 0; i < n; i++)
L[i][i] = 1;
for (cl=2; cl<=n; cl++) //again this is the length of chain we are considering
{
for (i=0; i<n-cl+1; i++) //start at i
{
j = i+cl-1; //end at j
if (str[i] == str[j] && cl == 2) //if only 2 characters and they are the same then set L[i][j] = 2
L[i][j] = 2;
else if (str[i] == str[j]) //if greater than length 2 and first and last characters match, add 2 to the calculated value of the center stripped of both ends
L[i][j] = L[i+1][j-1] + 2;
else
L[i][j] = max(L[i][j-1], L[i+1][j]); //if not match, then take max of 2 possibilities
}
}
return L[0][n-1];
}
so this is just like the non dynamic programming logically, it's just here we save the result in an array so we don't calculate the same thing over and over
It's the same as this problem: http://code.google.com/codejam/contest/1781488/dashboard#s=p2
http://code.google.com/codejam/contest/1781488/dashboard#s=a&a=2
In the code below I give you a cd(s) method (which returns a char dict which tells you where the next or previous char in the string is that is equal to that char). Use this to implement a dynamic programming solution for which i've also given you sample code.
With dynamic programming there are only len(s1)*len(s1)/2 states so order(N^2) is possible.
The idea is to either take a char from s1 or not take it. If you take the char off s1, you must take it from the front and the back (of both s1 and s2). If you don't take it, you move ahead 1. (this means the state of s2 keeps in sync with the state of s1 because you always take greedily from the outside of both - so you only worry about how many states s1 can have).
This code gets you most of the way there. cd1 (char dict 1) helps you find the next character in s1 equal to the char you're at (both forward and backward).
In the recursive solve() method you need to decide on what start1, end1.. etc should be. Add 2 to the total each time you take a character (unless start1 == end1 - then add 1)
s1 = "afbcdfca"
s2 = "bcadfcgyfka"
def cd(s):
"""returns dictionary d where d[i] = j where j is the next occurrence of character i"""
char_dict = {}
last_pos = {}
for i, char in enumerate(s):
if char in char_dict:
_, forward, backward = char_dict[char]
pos = last_pos[char]
forward[pos] = i
backward[i] = pos
last_pos[char] = i
else:
first, forward, backward = i, {}, {}
char_dict[char] = (first, forward, backward)
last_pos[char] = i
return char_dict
print cd(s1)
"""{'a': ({0: 7}, {7: 0}), 'c': ({3: 6}, {6: 3}), 'b': ({}, {}), 'd': ({}, {}), 'f': ({1: 5}, {5: 1})}"""
cd1, cd2 = cd(s1), cd(s2)
cache = {}
def solve(start1, end1, start2, end2):
state = (start1, end1)
answer = cache.get(state, None)
if answer:
return answer
if start1 < end1:
return 0
c1s, c1e = s1[start1], s1[end1]
c2s, c2e = s2[start2], s2[end2]
#if any of c1s, c1e, c2s, c2e are equal and you don't take, you must
#skip over those characters too:
dont_take_end1 = solve(start1, end1 - 1, start2, end2)
do_take_end1 = 2
if do_take_end1:
end1_char = s1[end1]
#start1 = next character along from start1 that equals end1_char
#end1 = next char before end1 that equals end1_char
#end2 = next char before end2 that..
#start2 = next char after .. that ..
do_take_end1 += solve(start1, end1, start2, end2)
answer = cache[state] = max(dont_take_end1, do_take_end1)
return answer
print solve(0, len(s1), 0, len(s2))