0
votes

I've read before here on SO (EDIT: Incremental Checksums) that there are some checksum algorithms (I think one of those is adler32) that support the following property:

adler32('abc'); // 123
adler32('def'); // 456
adler32('abcdef'); // 579 (123 + 456)

Please note that the results are only examples to demonstrate what I want to archieve. I've tried some examples with the hash extension in PHP with the adler and fletcher modules but the values don't seem to add up. Can someone show me some implementation examples?

5
hash(string) { return length(string); } =)Zed
Do you have a real-world problem where this is an issue?Brian
Brian: no, I just read it somewhere here on SO and I found it interesting. Does that make it a less valid question?Alix Axel

5 Answers

3
votes

If what you are looking for is a hash function on strings such that hash(s1+s2) = hash(s1) + hash(s2), I am pretty sure that all functions hash with this property are of the form (sum of the hash_char(c) for all the chars c in the string), for a fixed function hash_char.

That does not make for a very good hash function. Do you not misremember what you saw about adler32?

2
votes

Adler32 is not a hash function.
There are no good hash functions with this property.

There are, however, encryption functions with a similar property; known as homomorphism.
Basically:

E(text1)*E(text2)=cipher  
D(cipher) = text1 + text2  

Where E is the encryption function, D is the encryption function, and the texts are numbers (or data serialized as a number). Note that schemes using operations other than + & * do exist.

Two examples schemes: ElGamal, and Paillier. Both are slow, which is a common feature of asymmetric encryption schemes. Both of these examples use * & +.

1
votes

Unless I misunderstand I don't see how this is possible without an unnecessary number of collisions.

hash('cde') => 345 
hash('aaabcd') => 345
1
votes

Adler32 is a checksum algorithm rather than a hash, and doesn't have the property you describe.

It's quite common for hashes to have the property that the state of the hash can be described by the result of the hash, so if

H ( "aaa" ) = G ( H0, "aaa" )

where G is the function of the hash and the concatenation to the hash, and H0 is an initial value, often zero. Then

H ( "aaa" ) = G ( H("aa"), "a" ) = G ( G ( H("a"), "a" ), "a" ) = G ( G ( G ( H0, "a" ), "a" ), "a" )

But a function which is simply adding some function of the characters in the input together ( as is implied by your rule that G ( x .concat. y ) = G(x) + G(y) ) will collide for all permutations of that input.

One such function might create a 128 bit hash from an ASCII input:

H(x) = sum ( b => 1 << b foreach byte b in x ) 

which would have the property that

H("abcdef") == H("abc") + H("def") 
            == H("a") + H("b") + H("c") + H("d") + H("e") + H("f") 
            == H("abcdfe")
            == H("abcfde")
            == H("abcfed")
 etc.
0
votes

In view of Alder32 algorithm description, I don't see how the property you describe with regard to concatenation of string is possible.

Edit: removed flawed demo that such a property for a hash is impossible. Indeed an example of such a hash is the one that simply adds (well maybe modulo some big number, but that is understandable) the ASCII value of the characters in the input. (Thank you Pete Kirkham to point this error of mine).

Rather than Alder32 you may be referring to homomorphic encryption algorithms. You may find a list of such encryption schemes here. Craig Gentry, a research at IBM, also announced his success with producing fully homomorphic encryption, but I do not know if any technical details were published at this time (This would be big news, BTW).