2
votes

Given a string and array of strings find the longest suffix of string in array.

for example

string = google.com.tr

array = tr, nic.tr, gov.nic.tr, org.tr, com.tr

returns com.tr

I have tried to use binary search with specific comparator, but failed.

C-code would be welcome.

Edit:

I should have said that im looking for a solution where i can do as much work as i can in preparation step (when i only have a array of suffixes, and i can sort it in every way possible, build any data-structure around it etc..), and than for given string find its suffix in this array as fast as possible. Also i know that i can build a trie out of this array, and probably this will give me best performance possible, BUT im very lazy and keeping a trie in raw C in huge peace of tangled enterprise code is no fun at all. So some binsearch-like approach will be very welcome.

5
do you want us to solve your homework? - No Idea For Name
What are your complexity requirements? Will you do this many times for one string and many different arrays or all the time for a single array but different string each time? - Michał Trybus
I don't want to spoil the fun, so I'll just suggest that you could build some sort of tree structure out of the reversed array elements and then traverse that tree as you match characters input string from the back. - Frerich Raabe
@FrerichRaabe is the idea to avoid duplication of effort checking array strings that themselves contain other array strings? - tacos_tacos_tacos
I immediately have to ask you EXACTLY what you mean because I'm not sure. When you say longest suffix, do you mean "longest suffix of string found anywhere in an array element", or "longest suffix of string that /is/ an array element", or "longest suffix of string that is also a suffix of an array element"? All of these can be solved in O(n * m) where n is number of characters in array, m number of characters in string. - Veltas

5 Answers

1
votes

Assuming constant time addressing of characters within strings this problem is isomorphic to finding the largest prefix.

  1. Let i = 0.

  2. Let S = null

  3. Let c = prefix[i]

  4. Remove strings a from A if a[i] != c and if A. Replace S with a if a.Length == i + 1.

  5. Increment i.

  6. Go to step 3.

Is that what you're looking for?


Example:

prefix = rt.moc.elgoog

array = rt.moc, rt.org, rt.cin.vof, rt.cin, rt

Pass 0: prefix[0] is 'r' and array[j][0] == 'r' for all j so nothing is removed from the array. i + 1 -> 0 + 1 -> 1 is our target length, but none of the strings have a length of 1, so S remains null.

Pass 1: prefix[1] is 't' and array[j][1] == 'r' for all j so nothing is removed from the array. However there is a string that has length 2, so S becomes rt.

Pass 2: prefix[2] is '.' and array[j][2] == '.' for the remaining strings so nothing changes.

Pass 3: prefix[3] is 'm' and array[j][3] != 'm' for rt.org, rt.cin.vof, and rt.cin so those strings are removed.

etc.

0
votes

Another naïve, pseudo-answer.

Set boolean "found" to false. While "found" is false, iterate over the array comparing the source string to the strings in the array. If there's a match, set "found" to true and break. If there's no match, use something like strchr() to get to the segment of the string following the first period. Iterate over the array again. Continue until there's a match, or until the last segment of the source string has been compared to all the strings in the array and failed to match.

Not very efficient....

0
votes

Naive, pseudo-answer:

  1. Sort array of suffixes by length (yes, there may be strings of same length, which is a problem with the question you are asking I think)
  2. Iterate over array and see if suffix is in given string
  3. If it is, exit the loop because you are done! If not, continue.

Alternatively, you could skip the sorting and just iterate, assigning the biggestString if the currentString is bigger than the biggestString that has matched.

Edit 0:

Maybe you could improve this by looking at your array before hand and considering "minimal" elements that need to be checked.

For instance, if .com appears in 20 members you could just check .com against the given string to potentially eliminate 20 candidates.

Edit 1:

On second thought, in order to compare elements in the array you will need to use a string comparison. My feeling is that any gain you get out of an attempt at optimizing the list of strings for comparison might be negated by the expense of comparing them before doing so, if that makes sense. Would appreciate if a CS type could correct me here...

0
votes

If your array of strings is something along the following:

char string[STRINGS][MAX_STRING_LENGTH];
string[0]="google.com.tr";
string[1]="nic.tr";

etc, then you can simply do this:

int x, max = 0;

for (x = 0; x < STRINGS; x++) {
    if (strlen(string[x]) > max) {
        max = strlen(string[x]);
    }
}

x = 0;

while(true) {
    if (string[max][x] == ".") {
       GOTO out;
    }
    x++;
}

out:

char output[MAX_STRING_LENGTH];
int y = 0;

while (string[max][x] != NULL) {
    output[y++] = string[++x];
}

(The above code may not actually work (errors, etc.), but you should get the general idea.

0
votes

Why don't you use suffix arrays ? It works when you have large number of suffixes.

Complexity, O(n(logn)^2), there are O(nlogn) versions too.

Implementation in c here. You can also try googling suffix arrays.