Find the maximum number of words (from a given dictionary) that collectively represent a given text

Question

Input: Text T and a set of n words over a finite alphabet.

We have to find the longest representation of words that when combined make up T. This can be done by merging words together.

There could be no such representaion
We can use the same word from the dictionary more than once
The solution must be optimal
There is no constraint on the length of the output

For instance, given the output:

words ={ "na", "ba", "banana", "bana", "a", "nan"}

T= "banana"

The output should be "ba""nan""a" because the number of words is 3. "bana""na" and "banana" consist of 2 and 1 word/s respectively, therefore this is not the maximum number of words, and the output should be "ba""nan""a".

I have tried to solve this with a greedy algorithm, but it does not work. So, I think that the solution is dynamic programming, but I don't know how. I tried another approach, to do this with a Trie - by searching for the current letter in the vertices of the trie.

Search for the first letter of T in the Trie
Check the other letters on the vertice if T is not the sole letter on the vertice
If T is the sole letter on the vertice, check its children
If the children do not match with the i-th letter of T, search for the ith letter in the Trie again
If the letters on the curent vertice do not match, then there is no such representaion.

Is this an optimal, or even proper solution? If not, what is the dynamic programming solution?

Thank you in advance!

How is your algorithm ensuring you will find not only a solution, but the maximised solution? The bullet list is not really a clear algorithm. Could you not provide (pseudo)code that you have tried with? What were the results? — trincot
Well yeah... I just found a counter-example for my method :D — prism
You should add that to your question, because there you currently ask whether it is optimal or proper solution. Now you know it cannot be, so your question needs to be rephrased. Please add as much information as you can. For instance, (pseudo)code, and the counter example. — trincot

xashru xashru · Accepted Answer · 2020-04-26T12:48:28

You can solve this using dynamic programming. Here is a solution in python using top-down approach with memoization

memo = {} # memoization
def find_max(words, t, pos):
    if pos in memo:
        return memo[pos]
    ret = 0
    for i in range(pos+1, 1+len(t)):
        sub = t[pos:i]
        if sub in words:
            ret = max(ret, 1 + find_max(words, t, i))
    memo[pos] = ret
    return ret

words = ["na", "ba", "banana", "bana", "a", "nan"]
t = "banana"
ret = find_max(words, t, 0)   # returns 3

You can optimize the for loop for finding the matched words starting from pos in a more efficient way. Trie can be used for this.

Find the maximum number of words (from a given dictionary) that collectively represent a given text

1 Answers