0
votes

I am trying to implement the Rabin Karp algorithm with mod. The hash function which i am using is:

H1= c1*a^k-1 + c2*a^k-2 +c3*a^k-3 +…+ck*a^0

Here cx is the ASCII value of the character. And to roll it I first drop the first term by subtracting it, then multiply by a and add the new term by multiplying it with a^0.

Now the problem is to deal with large values i have used mod operations but doing that i am not able to roll it correctly. My code is as follows:

public class RabinKarp {
private static final int base = 26;
private static final int mod = 1180637;

public static void main(String[] args) {
    String text = "ATCAAGTTACCAATA";
    String pattern = "ATA";
    char[] textArr = text.toCharArray();
    char[] patternArr = pattern.toCharArray();
    System.out.println(getMatchingIndex(textArr, patternArr));
}

public static int getMatchingIndex(char[] textArr, char[] patternArr) {
    int n = textArr.length;
    int m = patternArr.length;
    int patternHash = getHashForPatternSize(patternArr, m);
    int textHash = getHashForPatternSize(textArr, m);
    for(int i = 0; i < n-m; i++) {
        if(patternHash == textHash && checkMatch(textArr, patternArr, i, m))
            return i;
        textHash = rollingHash(textArr, textHash, i, m);    
    }
    return -1;
}

public static boolean checkMatch(char[] textArr, char[] patternArr, int i, int m) {
    for(int j = 0; j < m; j++,i++) {
        if(textArr[i] != patternArr[j])
            return false;
    }
    return true;
}

public static int rollingHash(char[] textArr, int textHash, int i, int m) {
    return (textHash * base - modularExponentiation(base, m, mod) * (int)textArr[i] + (int) textArr[i+m])%mod;
}

public static int getHashForPatternSize(char[] arr, int m) {
    int hash = 0;
    for(int i = 0, p = m; i < m; i++, p--) {
        hash = (hash%mod + calcHash(arr[i], p)%mod)%mod;
    }
    return hash;
}

public static int calcHash(char alphabet, int p) {
    return (((int) alphabet)%mod * modularExponentiation(base, p, mod)%mod)%mod;
}

public static int modularExponentiation(int base, int p, int mod) {
    if(p == 0)
        return 1;
    if(p%2 == 0)
        return modularExponentiation((base*base)%mod, p/2, mod);
    else
        return (base*modularExponentiation((base*base)%mod, (p-1)/2, mod))%mod;
}
}

Problem is that textHash and patternHash do not match at any point. I am sure that the problem is with the mod operations. Can anyone tell how to have mod as well as to use the rolling hash correctly. I would be very thankful.

1
You might need to debug the method modularExponentation, but I think you don't need to do it recursively. You could do it iteratively to make it less complicated. - SomeDude
the method is correct but the problem is the exponentiation has a sure effect and when i am trying to roll it (textHash - calcHash(textArr[i], 0))/base - this division by base is not working i think.....my target is to make this algorithm good for bigger values. I can surely choose the base as 10 and choose not to use mod but that doesnt serve my purpose - user6575289
I don't know what language you're using but in most C-like languages the % operator does not correctly compute mod unless both operands are positive. - rici
No problem it is all sorted now @rici - user6575289

1 Answers

1
votes

The usual way to compute a Rabin-Karp rolling hash is to consider the characters in big-endian order, rather than your little-endian solution. This makes the arithmetic much easier since it avoids division. Modular division is non-trivial and you cannot simply implement it as (p/q)%b.

If we take the rolling hash as

H0…k-1 = (c0*ak-1 + c1*ak-2 + c2*ak-3 …+… ck-1*a0) mod b

Then the next term is:

H1…k   = (         c1*ak-1 + c2*ak-2 …+… ck-1*a1 + ck*a0) mod b

And we can easily see that

H1…k   = (a * H0…k-1 - c0*ak + ck) mod b

If we then precompute m == ak mod b, that becomes:

H1…k   = (a * H0…k-1 - m * c0 + ck) mod b

which is much less work on each iteration, and does not depend on division at all.