2
votes

I have a set of strings (less than 30) of length 1 to ~30. I need to find the subset of at least two strings that share the longest possible prefix- + suffix-combination.

For example, let the set be

Foobar
Facar
Faobaron
Gweron
Fzobar

The prefix/suffix F/ar has a combined length of 3 and is shared by Foobar, Facar and Fzobar; the prefix/suffix F/obar has a combined length of 5 and is shared by Foobar and Fzobar. The searched-for prefix/suffix is F/obar.

Note that this is not to be confused with the longest common prefix/suffix, since only two or more strings from the set need to share the same prefix+suffix. Also note that the sum of the lengths of both the prefix and the suffix is what is to be maximized, so both need to be taken into account. The prefix or suffix may be the empty string.

Does anyone know of an efficient method to implement this?

1
If you want to maximize prefix + suffix, don't see the need to choose a subset of more than 2 elements - juvian

1 Answers

0
votes

How about this:

maxLen := -1;
for I := 0 to Len(A) - 1 do
  if Len(A[I]) > maxLen then // (1)
    for J := 0 to Len(A[I]) do
      for K := 0 to Len(A[I]) - J do
        if J+K > maxLen then // (2)
        begin
          prf := LeftStr(A[I], J);
          suf := RightStr(A[I], K);
          found := False;
          for m := 0 to Len(sufList) - 1 do
            if (sufList[m] = suf) and (prfList[m] = prf) then
            begin
              maxLen := J+K;
              Result := prf+'/'+suf;
              found := True;
              // (3)
              n := 0;
              while n < Len(sufList) do
                if Len(sufList[n])+Len(prfList[n]) <= maxLen then
                begin
                  sufList.Delete(n);
                  prfList.Delete(n);
                end
                else
                  Inc(n);
              // (end of 3)
              Break;
            end;
          if not found then
          begin
            sufList.Add(suf);
            prfList.Add(prf);
          end;
        end;

In this example maxLen keeps sum of lengths of longest found prefix/suffix so far. The most important part of it is the line marked with (2). It bypasses lots of unnecessary string comparisons. In section (3) it eliminates any existing prefix/suffix that is shorter than newly found one (winch is duplicated).