Longest common prefix for n string

8

votes

Given n string of max length m. How can we find the longest common prefix shared by at least two strings among them?

Example: ['flower', 'flow', 'hello', 'fleet']

Answer: fl

I was thinking of building a Trie for all the string and then checking the deepest node (satisfies longest) that branches out to two/more substrings (satisfies commonality). This takes O(n*m) time and space. Is there a better way to do this

algorithm

@Mark I believe this example would be flow. Judging by the context of the proposed solution, it only has to be common to at least 2, not to all. I agree some clarification from OP is necessary here. - corsiKa

a string may start without 'fl'. 'hello' was put to prove a point that it could be any strings where in 1 string need not have any common prefix with the others - shreyasva

14

votes

Why to use trie(which takes O(mn) time and O(mn) space, just use the basic brute force way. first loop, find the shortest string as minStr, which takes o(n) time, second loop, compare one by one with this minStr, and keep an variable which indicates the rightmost index of minStr, this loop takes O(mn) where m is the shortest length of all strings. The code is like below,

public String longestCommonPrefix(String[] strs) {
    if(strs.length==0) return "";
    String minStr=strs[0];

    for(int i=1;i<strs.length;i++){
        if(strs[i].length()<minStr.length())
            minStr=strs[i];
    }
    int end=minStr.length();
    for(int i=0;i<strs.length;i++){
        int j;
        for( j=0;j<end;j++){
            if(minStr.charAt(j)!=strs[i].charAt(j))
                break;
        }
        if(j<end)
            end=j;
    }
    return minStr.substring(0,end);
}

14

votes

there is an O(|S|*n) solution to this problem, using a trie. [n is the number of strings, S is the longest string]

(1) put all strings in a trie
(2) do a DFS in the trie, until you find the first vertex with more than 1 "edge".
(3) the path from the root to the node you found at (2) is the longest common prefix.

There is no possible faster solution then it [in terms of big O notation], at the worst case, all your strings are identical - and you need to read all of them to know it.

5

votes

I would sort them, which you can do in n lg n time. Then any strings with common prefixes will be right next to eachother. In fact you should be able to keep a pointer of which index you're currently looking at and work your way down for a pretty speedy computation.

2

votes

As a completely different answer from my other answer...

You can, with one pass, bucket every string based on its first letter.

With another pass you can sort each bucket based on its second later. (This is known as radix sort, which is O(n*m), and O(n) with each pass.) This gives you a baseline prefix of 2.

You can safely remove from your dataset any elements that do not have a prefix of 2.

You can continue the radix sort, removing elements without a shared prefix of p, as p approaches m.

This will give you the same O(n*m) time that the trie approach does, but will always be faster than the trie since the trie must look at every character in every string (as it enters the structure), while this approach is only guaranteed to look at 2 characters per string, at which point it culls much of the dataset.

The worst case is still that every string is identical, which is why it shares the same big O notation, but will be faster in all cases as is guaranteed to use less comparisons since on any "non-worst-case" there are characters that never need to be visited.

1

votes

public String longestCommonPrefix(String[] strs) {

    if (strs == null || strs.length == 0)
        return "";

    char[] c_list = strs[0].toCharArray();
    int len = c_list.length;
    int j = 0;
    for (int i = 1; i < strs.length; i++) {
        for (j = 0; j < len && j < strs[i].length(); j++) 
           if (c_list[j] != strs[i].charAt(j)) 
            break;
        len = j;
    }

    return new String(c_list).substring(0, len);

}

1

votes

It happens that the bucket sort (radix sort) described by corsiKa can be extended such that all strings are eventually placed alone in a bucket, and at that point, the LCP for such a lonely string is known. Further, the shustring of each string is also known; it is one longer than is the LCP. The bucket sort is defacto the construction of a suffix array but, only partially so. Those comparisons that are not performed (as described by corsiKa) indeed represent those portions of the suffix strings that are not added to the suffix array. Finally, this method allows for determination of not just the LCP and shustrings, but also one may easily find those subsequences that are not present within the string.

0

votes

Since the world is obviously begging for an answer in Swift, here's mine ;)

func longestCommonPrefix(strings:[String]) -> String {

    var commonPrefix = ""
    var indices = strings.map { $0.startIndex}

outerLoop:

    while true {
        var toMatch: Character = "_"

        for (whichString, f) in strings.enumerate() {

            let cursor = indices[whichString]

            if cursor == f.endIndex {   break outerLoop     }

            indices[whichString] = cursor.successor()

            if whichString == 0     {   toMatch = f[cursor] }
            if toMatch != f[cursor] {   break outerLoop     }
        }

        commonPrefix.append(toMatch)
    }

    return commonPrefix
}

Swift 3 Update:

func longestCommonPrefix(strings:[String]) -> String {

    var commonPrefix = ""
    var indices = strings.map { $0.startIndex}

    outerLoop:

        while true {
            var toMatch: Character = "_"

            for (whichString, f) in strings.enumerated() {

                let cursor = indices[whichString]

                if cursor == f.endIndex {   break outerLoop     }

                indices[whichString]  = f.characters.index(after: cursor)

                if whichString == 0     {   toMatch = f[cursor] }
                if toMatch != f[cursor] {   break outerLoop     }
            }

            commonPrefix.append(toMatch)
    }

    return commonPrefix
}

What's interesting to note:

this runs in O^2, or O(n x m) where n is the number of strings and m is the length of the shortest one.
this uses the String.Index data type and thus deals with Grapheme Clusters which the Character type represents.

And given the function I needed to write in the first place:

/// Takes an array of Strings representing file system objects absolute
/// paths and turn it into a new array with the minimum number of common
/// ancestors, possibly pushing the root of the tree as many level downwards
/// as necessary
///
/// In other words, we compute the longest common prefix and remove it

func reify(fullPaths:[String]) -> [String] {

    let lcp = longestCommonPrefix(fullPaths)

    return fullPaths.map {
        return $0[lcp.endIndex ..< $0.endIndex]
    }
}

here is a minimal unit test:

func testReifySimple() {
    let samplePaths:[String] = [
        "/root/some/file"
    ,   "/root/some/other/file"
    ,   "/root/another/file"
    ,   "/root/direct.file"
    ]

    let expectedPaths:[String] = [
        "some/file"
    ,   "some/other/file"
    ,   "another/file"
    ,   "direct.file"
    ]

    let reified = PathUtilities().reify(samplePaths)

    for (index, expected) in expectedPaths.enumerate(){
        XCTAssert(expected == reified[index], "failed match, \(expected) != \(reified[index])")
    }
}

0

votes

Perhaps a more intuitive solution. Channel the already found prefix out of earlier iteration as input string to the remaining or next string input. [[[w1, w2], w3], w4]... so on], where [] is supposedly the LCP of two strings.

public String findPrefixBetweenTwo(String A, String B){
    String ans = "";
    for (int i = 0, j = 0; i < A.length() && j < B.length(); i++, j++){
        if (A.charAt(i) != B.charAt(j)){
            return i > 0 ? A.substring(0, i) : "";
        }
    }
    // Either of the string is prefix of another one OR they are same.
    return (A.length() > B.length()) ?  B.substring(0, B.length()) : A.substring(0, A.length());
}
public String longestCommonPrefix(ArrayList<String> A) {
    if (A.size() == 1) return A.get(0);

    String prefix = A.get(0);
    for (int i = 1; i < A.size(); i++){
        prefix = findPrefixBetweenTwo(prefix, A.get(i)); // chain the earlier prefix 
    }
    return prefix;
}

Longest common prefix for n string

8 Answers