2
votes

I have an input of 288 bits (comprising 4 × 32-bit identity function outputs and 10 × 16-bit integers). I need to hash this to 96 bits with as few collisions as possible. The goal could be stated as key compression with probabilistic collisions.

I'm aware that CRC is a bijective hash, thus ensuring 100% even distribution (as I understand it). In my view, I should be able to run 3 parallel CRC paths through the input, resulting in a 96-bit lossy hash (obviously not bijective) of optimum distribution.

However, I'm also aware that CRC is not used for such applications. An algorithm such as MetroHash would typically be used.

Could someone explain to me why CRC is a bad (or not) idea for this application?

Note: This is not intended for anything secure.

1
CRC can efficiently be implemented in shift registers+few gates - software implementations are done for compatibility or nostalgia. I'd try a Fletcher sum, even if this is just nine to five terms. Data size being fixed, you can play with "the factors". Note that there are checksums which are problematic for small sizes, Adler32 being one. (Another tag to consider is checksum.) - greybeard
(Just re-visited the English wikipedia article on Fletcher's checksum - the "Optimizations" section is ridiculous. You don't do modulus, just add-with-carry.) - greybeard
@greybeard You can edit Wikipedia too, you know :-) - m69 ''snarky and unwelcoming''

1 Answers

2
votes

Sure, that can work, but there are probably better approaches.

For it to work, you would need to use three different CRC-32's with three different polynomials. And even then, be careful that they don't have common factors (e.g. x+1), to make sure that there are no correlated bits between the three.

Better would be an approach like used in xxhash, but extended to 96 bits. That would be faster in software.

Why 96 bits? That seems like an unnecessarily long hash.