Longest Common Palindromic Subsequence

5

votes

Is there any efficient algorithm that counts the length of Longest Common Palindromic Subsequence of two given strings?

for example:

string 1. afbcdfca

string 2. bcadfcgyfka

the LCPS is 5 and LCPS string is afcfa.

stringalgorithmlcs

What does this have to do with palindromes? - Keith Randall

Ah, I see, the LCPS is "afcfa", not "afca". - Keith Randall

dynamic programming question. please look at w.r.t DP - Dhruvenkumar Shah

I googled your request and found a paper on arxiv Computing a Longest Common Palindromic Subsequence. - Kwariz

@Kwariz this is really a good article on LCPS. But it's complexity is O(n^4)! I asked this question here to know whether there is any algorithm with less run time or not. - Mostafiz Rahman

5

votes

Yes.

Just use an algorithm for LCS for more than two sequences.

If I'm not mistaken, then

 LCS( afbcdfca, acfdcbfa, bcadfcgyfka, akfygcfdacb) = afcfa

It's up to you to figure out strings #2 and #4.

Update: No, here is a counterexample: LCS( aba, aba, bab, bab ) = ab, ba. The LCS doesn't ensure the subsequence will be a palindrome, you probably need to add this constraint. Anyway, the recurrence equation for LCS is a good starting point.

If you implement LCS in a generator style, so that it generates all LCS of length n, then n-1, n-2 etc., then you should be able to fairly efficiently compute the longest common member in LCS-gen(string1, reverse-string1), LCS-gen(string2, reverse-string2). But I havn't checked yet, if there is a highly efficient LCS-gen.

1

votes

Here's my foolproof walkthrough line by line since this is quite common and most of the time people explain the dynamic programming part 70% and stop at the gory details.

1) Optimal Substructure: Let X[0..n-1] be the input sequence of length n and L(0, n-1) be the length of the longest palindromic subsequence of X[0..n-1].

If last and first characters of X are same, then L(0, n-1) = L(1, n-2) + 2. wait why, what if the 2nd and the 2nd to last characters are not the same, wouldn't the last and first bing the same be useless. nope this "subsequence" not have to be continuous.

/* Driver program to test above functions */
int main()
{
    char seq[] = "panamamanap";
    int n = strlen(seq);
    printf ("The lnegth of the LPS is %d", lps(seq, 0, n-1));
    getchar();
    return 0;
}

int lps(char *seq, int i, int j)
{   
    // Base Case 1: If there is only 1 character   
    if (i == j)     
        return 1;    

    // Base Case 2: If there are only 2 characters and both are same   
    if (seq[i] == seq[j] && i + 1 == j)     
        return 2;    

    // If the first and last characters match   
    if (seq[i] == seq[j])      
        return lps (seq, i+1, j-1) + 2;    

    // If the first and last characters do not match   
    else return max( lps(seq, i, j-1), lps(seq, i+1, j) );
}

Considering the above implementation, following is a partial recursion tree for a sequence of length 6 with all different characters.

               L(0, 5)
             /        \ 
            /          \  
        L(1,5)          L(0,4)
       /    \            /    \
      /      \          /      \
  L(2,5)    L(1,4)  L(1,4)  L(0,3)

In the above partial recursion tree, L(1, 4) is being solved twice. If we draw the complete recursion tree, then we can see that there are many subproblems which are solved again and again. Like other typical Dynamic Programming(DP) problems, recomputations of same subproblems can be avoided by constructing a temporary array L[][] in bottom up manner.

// Returns the length of the longest palindromic subsequence in seq

int lps(char *str)
{
   int n = strlen(str);
   int i, j, cl;
   int L[n][n];  // Create a table to store results of subproblems


   // Strings of length 1 are palindrome of length 1
   for (i = 0; i < n; i++)
      L[i][i] = 1;

    for (cl=2; cl<=n; cl++)                              //again this is the length of chain we are considering
    {
        for (i=0; i<n-cl+1; i++)                         //start at i
        {
            j = i+cl-1;                                  //end at j
            if (str[i] == str[j] && cl == 2)             //if only 2 characters and they are the same then set L[i][j] = 2
               L[i][j] = 2;
            else if (str[i] == str[j])                   //if greater than length 2 and first and last characters match, add 2 to the calculated value of the center stripped of both ends
               L[i][j] = L[i+1][j-1] + 2;
            else
               L[i][j] = max(L[i][j-1], L[i+1][j]);      //if not match, then take max of 2 possibilities
        }
    }

    return L[0][n-1];
}

so this is just like the non dynamic programming logically, it's just here we save the result in an array so we don't calculate the same thing over and over

0

votes

It's the same as this problem: http://code.google.com/codejam/contest/1781488/dashboard#s=p2

http://code.google.com/codejam/contest/1781488/dashboard#s=a&a=2

In the code below I give you a cd(s) method (which returns a char dict which tells you where the next or previous char in the string is that is equal to that char). Use this to implement a dynamic programming solution for which i've also given you sample code.

With dynamic programming there are only len(s1)*len(s1)/2 states so order(N^2) is possible.

The idea is to either take a char from s1 or not take it. If you take the char off s1, you must take it from the front and the back (of both s1 and s2). If you don't take it, you move ahead 1. (this means the state of s2 keeps in sync with the state of s1 because you always take greedily from the outside of both - so you only worry about how many states s1 can have).

This code gets you most of the way there. cd1 (char dict 1) helps you find the next character in s1 equal to the char you're at (both forward and backward).

In the recursive solve() method you need to decide on what start1, end1.. etc should be. Add 2 to the total each time you take a character (unless start1 == end1 - then add 1)

s1 = "afbcdfca"
s2 = "bcadfcgyfka"

def cd(s):
    """returns dictionary d where d[i] = j where j is the next occurrence of character i"""
    char_dict = {}
    last_pos = {}
    for i, char in enumerate(s):
        if char in char_dict:
            _, forward, backward = char_dict[char]
            pos = last_pos[char]
            forward[pos] = i
            backward[i] = pos
            last_pos[char] = i
        else:
            first, forward, backward = i, {}, {}
            char_dict[char] = (first, forward, backward)
            last_pos[char] = i
    return char_dict

print cd(s1)
"""{'a': ({0: 7}, {7: 0}), 'c': ({3: 6}, {6: 3}), 'b': ({}, {}), 'd': ({}, {}), 'f': ({1: 5}, {5: 1})}"""

cd1, cd2 = cd(s1), cd(s2)

cache = {}
def solve(start1, end1, start2, end2):
    state = (start1, end1)
    answer = cache.get(state, None)
    if answer:
        return answer

    if start1 < end1:
        return 0
    c1s, c1e = s1[start1], s1[end1]
    c2s, c2e = s2[start2], s2[end2]

    #if any of c1s, c1e, c2s, c2e are equal and you don't take, you must
    #skip over those characters too:
    dont_take_end1 = solve(start1, end1 - 1, start2, end2)

    do_take_end1 = 2
    if do_take_end1:
        end1_char = s1[end1]
        #start1 = next character along from start1 that equals end1_char
        #end1 = next char before end1 that equals end1_char
        #end2 = next char before end2 that..
        #start2 = next char after .. that ..
        do_take_end1 += solve(start1, end1, start2, end2)


    answer = cache[state] = max(dont_take_end1, do_take_end1)
    return answer

print solve(0, len(s1), 0, len(s2))

Longest Common Palindromic Subsequence

3 Answers