25
votes

I'm interested in optimizing the hashing of some large files (optimizing wall clock time). The I/O has been optimized well enough already and the I/O device (local SSD) is only tapped at about 25% of capacity, while one of the CPU cores is completely maxed-out.

I have more cores available, and in the future will likely have even more cores. So far I've only been able to tap into more cores if I happen to need multiple hashes of the same file, say an MD5 AND a SHA256 at the same time. I can use the same I/O stream to feed two or more hash algorithms, and I get the faster algorithms done for free (as far as wall clock time). As I understand most hash algorithms, each new bit changes the entire result, and it is inherently challenging/impossible to do in parallel.

Are any of the mainstream hash algorithms parallelizable?
Are there any non-mainstream hashes that are parallelizable (and that have at least a sample implementation available)?

As future CPUs will trend toward more cores and a leveling off in clock speed, is there any way to improve the performance of file hashing? (other than liquid nitrogen cooled overclocking?) or is it inherently non-parallelizable?

3
Also, I hear that most current hash algorithms can be parallelized, but I'm not sure what that takes. Obviously, one way to do it would be to decide for yourself to hash each, say, 4k chunk of file, and then combine the hashes somehow. XOR, perhaps? Always dangerous cryptographically to invent your own algorithm, so I wouldn't trust this if you're defending against malicious data tampering instead of accidental data corruption.sblom
I read the Skein specification you linked. What you suggest here is exactly how it acheives parallelization (apparently it is called "tree hashing".) Skein has a standard way to specify leaf-size, fan-out and maximum tree height so that anyone using the same parameters would get the same hash result. (that's important) I'd like to defend against malicious tampering as well as accidental corruption. I wish the standards were ready already.DanO
tools.ietf.org/html/rfc1321 It seems MD5 is not parallelizable easily, computations for each block depend on state computed with all earlier blocks. If this property would not hold, then MD5 would not be secure (exchanging blocks' position would not affect hash – it's not good). Anyway I don't say parallelization of MD5 is not possible, just impossible at first sight.kgadek
@sblom what you're suggesting is similar to Merkle or hash trees which are used by file sharing software to ensure that the right files are downloaded from peers. en.wikipedia.org/wiki/Merkle_treeMauganRa

3 Answers

12
votes

There is actually a lot of research going on in this area. The US National Institute of Standards and Technology is currently holding a competition to design the next-generation of government-grade hash function. Most of the proposals for that are parallelizable.

One example: http://www.schneier.com/skein1.2.pdf

Wikipedia's description of current status of the contest: http://en.wikipedia.org/wiki/SHA-3

7
votes

What kind of SSD do you have ? My C implementation of MD5 runs at 400 MB/s on a single Intel Core2 core (2.4 GHz, not the latest Intel). Do you really have SSD which support a bandwidth of 1.6 GB/s ? I want the same !

Tree hashing can be applied on any hash function. There are a few subtleties and the Skein specification tries to deal with them, integrating some metadata in the function itself (this does not change much things for performance), but the "tree mode" of Skein is not "the" Skein as submitted to SHA-3. Even if Skein is selected as SHA-3, the output of a tree-mode hash would not be the same as the output of "plain Skein".

Hopefully, a standard will be defined at some point, to describe generic tree hashing. Right now there is none. However, some protocols have been defined with support for a custom tree hashing with the Tiger hash function, under the name "TTH" (Tiger Tree Hash) or "THEX" (Tree Hash Exchange Format). The specification for TTH appears to be a bit elusive; I find some references to drafts which have either moved or disappeared for good.

Still, I am a bit dubious about the concept. It is kind of neat, but provides a performance boost only if you can read data faster than what a single core can process, and, given the right function and the right implementation, a single core can hash quite a lot of data per second. A tree hash spread over several cores requires having the data sent to the proper cores, and 1.6 GB/s is not the smallest bandwidth ever.

SHA-256 and SHA-512 are not very fast. Among the SHA-3 candidates, assuming an x86 processor in 64-bit mode, some of them achieve high speed (more than 300 MB/s on my 2.4 GHz Intel Core2 Q6600, with a single core -- that's what I can get out of SHA-1, too), e.g. BMW, SHABAL or Skein. Cryptographically speaking, these designs are a bit too new, but MD5 and SHA-1 are already cryptographically "broken" (quite effectively in the case of MD5, rather theoretically for SHA-1) so any of the round-2 SHA-3 candidates should be fine.

When I put my "seer" cap, I foresee that processors will keep on becoming faster than RAM, to the point that hashing cost will be dwarfed out by memory bandwidth: the CPU will have clock cycles to spare while it waits for the data from the main RAM. At some point, the whole threading model (one big RAM for many cores) will have to be amended.

4
votes

You didn't say what you need your hash for. If you're not gonna exchange it with the outside world but just for internal use, simply divide each file in chunks, compute and store all the checksums. You can then use many cores just by throwing a chunk to each one.

Two solutions that comes to mind is dividing files in fixed-size chunks (simpler, but will use less cores for smaller files where you're not supposed to need all that power) or in a fixed-number of chunks (will use all the cores for every file). Really depends on what you want to achieve and what your file size distribution looks like.

If, on the other hand, you need hashes for the outside world, as you can read from the other replies it's not possible with "standard" hashes (eg. if you want to send out SHA1 hashes for others to check with different tools) so you must look somewhere else. Like computing the hash when you store the file, for later retrieval, or compute hashes in background with the 'free' cores and store for later retrieval.

The better solution depends on what your constraints are and where you can invest space, time or cpu power.