I have a list of strings for which I need to find ALL common unique substrings (actually paths) with minimal length in them. Example:
/a/b/c
/a/b
/a
/d/e/f
/d/e
/g/h
For this input, I need the following result:
/a
/d/e
/g/h
As you see, I need the paths (or substrings) with the minimal length that have a unique prefix. /a is the minimal substring for all paths starting with /a. /d/e is the minimal substring for all paths starting with /d/e. The same goes for /g/h.
A practical application of this is to find all roots of the path trees that have a certain file in it to analyze them further. Consider this example:
/a/b/c/index.html
/a/b/index.html
/a/index.html
/d/e/f/index.html
/d/e/index.html
/g/h/index.html
Let's say I want to have the topmost (in terms of the root) paths that contain an index.html file. As a result, I want "/a/index.html", "/d/e/index.html" and "/g/h/index.html".
Any ideas? There is a lot of theory and examples for the "simple" longest common substring problem, but I have not yet found a solution for finding ALL the common longest substrings efficiently.
A solution with pseudo code would be highly appreciated.
/g/h
would be part of the output. – Gordon Linoffa
covers the first three entries (abc
,ab
anda
),de
coversde
anddef
andgh
coversgh
. So there is no element in the source set which hasn't got a substring in the result set. That's my guess :) – biziclop